To some people providing estimates as a range of values seem a strange and unsatisfying way of conducting business. They just want to know how much something will cost, not how much it may or not cost. This is reasonable if the object or service is ready for use, but not if it has yet to be created. The more uncertainty involved in the process the more likely there will be some variability. Now consider this conversation:
Customer: How much will it cost for a taxi ride from the library to the airport
Taxi driver: Probably between $20 and $25, depending on traffic
Customer: OK, that sounds fair, let’s go
It seems more reasonable, we understand the variability of traffic. Unfortunately not all stakeholders understand the variability caused by evolving requirements and changing technology often associated with software projects. However the uncertainties of software development are real and we have an obligation to report estimates as ranges to help manage expectations.
Problems with single point estimates
If we say something will cost $2.5M and then try to explain, well actually it is more likely to be $2.8M sponsors are likely to get upset. We could just say it is likely to cost $3M, but this is a disservice to the project and business too. The project could well be turned down and it hides an implicit contingency ($0.5M) that is not being shared with the business.
Optimistic, Pessimistic and Expected Values
Estimation best practice recommends determining optimistic, pessimistic and expected values for estimates. Why do three times the work? Why not just estimate how long we think it is most likely to take? Unfortunately life exhibits a triangular distribution; when things go bad they go really bad!
Let’s use a real example. Driving to work usually takes me 20 minutes (I know, I am lucky) if I speed and run a few lights perhaps I could reduce this to 15 minutes. Yet, if there is an accident or heavy snowfall then this can easily take 40 minutes, an hour, or longer. The Overs always more than compensate for the Unders and this is true in software development too. If we hit a technical snag or have to redo something the extra time is usually far greater than any savings experienced.
In the table above, the sum of our expected values may be 22 days, but the mean indicates 27 days is more likely. This is why relying on only Expected values is so dangerous.
“Great, I will have the low estimate please!”
It is all well and good presenting estimates as ranges, but if sponsors gravitate to the lowest range of the estimate we need a way to explain how unlikely this outcome is. Fortunately we have the work of Mark Durrenberger and others to draw on that gives us % probabilities of achieving any particular value in the range.
(I have omitted much of the mathematical commentary from this post to focus on the general process. However, if you want to read more behind how the process works see here.)
Given optimistic, expected and pessimistic values for estimates we can calculate activity variances and project standard deviations. We can then use these numbers to defend our estimate ranges. Here is how it works:
Calculate Mean, Variances, and the Project Standard Deviation as:
This Project Standard Deviation allows us to defend our estimate ranges. We have a 50% chance of achieving the Mean value, and then for every Standard Deviation above or below that mean value we can predict the % probability of achieving that.
So with our sample project we have a 50% probability of achieving the Mean value of 27 days, then as we move one Standard Deviation (3.8 rounded to 4) below the mean we have only a 16% probability of completing in 23 days (27 – 4) and only a 3% chance of completing in 19 days (27 – 2 X 4). To determine a more realistic value, we could look to Mean + one Standard Deviation. This value of 27 + 4 = 31 days will should have a 84% probability of occurring.
By using this analysis of variance we can defend our estimate ranges. If someone asks for the low end of our estimate range we can explain that we have less than a 3% chance of hitting that and a more realistic value to use would be the 50% to 84% range. Alternatively you could save the debate and only quote the 50% to 97% (Mean to + 2 Standard Deviations) values; this does hide much of the range of the possible outcomes, and therefore may give the wrong impression of a more precise estimate, but could help in some circumstances.
Applying Analysis to Bad Data
A word of caution before we get caught up in the cleverness of our arguments. If our estimates are just rough approximations or guesses, them layering fancy math over the top of them does not make them any better and is likely to instill a false sense of accuracy. A guess is a guess, rather than stating we have an 84% chance of achieving (wild assed guess value) it is more representative to just take the 50% and 97% values and predicting the likely costs are somewhere in this region.
In an earlier post I discussed Don Reinertson’s idea of “No-Early” finishes for tasks on engineering projects. He reckons people are more likely to fine-tune and tweak work completed early so we never get any Unders, indicating only 50% and up values are likely.
Other Agilists assert that estimation on agile projects is so fraught with problems and variations in process that it is futile. Furthermore, attempts to apply advanced mathematics and analysis approaches to agile estimates are even more flawed. Personally, I share their concerns, but work in a world where sponsors need to have an idea of likely project costs. I think that is a reasonable request and so try to balance answering their questions with explaining the likely variation and problems of software estimation.
The key point is to frequently checkpoint spend and progress on the project and then update the forecasts for complete. Initial estimates contain wide variations but we often quickly gain access to reliable re-estimation data.