Agile Exception Processes – What to do when bad stuff happens to good projects
September 13, 2007
When caught by a fire or other urgent situations it is useful to have emergency equipment on hand and know how to use it. The same goes for Project Exception Processes, if something untoward happens then that is not the best time to be creating new processes to deal with the event and explaining how to use them. Emotions are high, people respond to bad news differently, and it is better to practice an agreed to procedure than figure out new rules.
Project tolerances and exception plans provide an agreed to emergency plan for when bad stuff happens to good projects. They act as guardrails to help prevent us going off track and provide a mutually understood and agreed to resolution process. So, just as during an emergency is not the best time to collaborate on improvising a rope ladder, nor is during a major project scope change the best time to define a resolution process between project stakeholders.
We will look at the two components (Tolerances and Exception Plans) individually and then examine how they work together. Project tolerances are the guardrails, the upper and lower boundaries the project stakeholders are willing to tolerate for a given project metric. Another way to think of it is how much slack rope we have as a project team to do our own thing (or hang our selves with). Tolerances can be set on a variety of metrics and the degree of variation will depend upon the individual risk tolerances of the collective stakeholders. Some projects might be very time critical, others more concerned with budget, or user satisfaction.
The graph below shows a budget tolerance set at +10% (red) and -20% (green) of the predicted spend rate.
The blue line is tracking the actual spend and in August it has exceeded the +10% upper tolerance that would trigger the Exception Plan.
(Significant project under spend can be a problem too, hence the lower green threshold. Returning a large portion of unspent funds at the end of a project represents lost opportunity, this money could have been used to fund other profitable projects during this project’s lifetime had it been returned.)
If this all sounds a little command-and-control prescriptive, and not very agile then perhaps we need to have a broader view of what effective agile tolerances and exception plan could look like.
The graph above shows Iteration Velocity that tracks Story Point completion per iteration. It is a good indicator of productivity throughput and tolerances can be used to trigger further investigation, if it drops below a certain level. In our example the lower tolerance is set to 65 story points per iteration for iteration #5 onwards (the red line). While you might think super high velocity is good, I would like to investigate if velocity appears to sky rocket since this could be a sign of gaming (fudging) the metrics. (When things appear to be too good to be true they usually are.)
Sponsor confidence is another metric we could set tolerances for. It is quite common to canvas sponsors for a simple “Green”, “Yellow”, “Red” flag each iteration. While the odd Yellow, or even a single Red might not be cause for notifying the steering committee (or whatever group of stakeholders we wish to engage) a persistent trend of Yellows or Reds indicates something is clearly wrong and we need to intervene. The model used in this example keeps a running score of Greens, Yellows, and Reds. A Green earns +1, a Yellow 0 and a Red -1, if the total ever drops below -2 then we need to investigate.
A similar approach can be used for User Satisfaction. Tracking application satisfaction via simple Smiley faces (Happy, Neutral, Sad) and keeping a cumulative score can be useful for tracking mood and identifying trends and issues.
Cycle Time (the time taken for something to move through the system) is an important metric for agile projects. Reducing cycle-time (for developing new features or defect correction) is a main objective for creating efficient processes. In the example above we are plotting average bug fix time; from detection through to confirmed resolution. The tolerance threshold was set as an average of 12 hours and we can see that in iteration 1 and 4 this was breached, but since then it have been generally reducing (which is a good thing.)
Breached Tolerance vs. Forecast Breached Tolerance
Ideally we do not wait until a tolerance has been breached before taking action. If a breach can be forecasted via a trend or new information, then we can be proactive about dealing with it. Just as we should use the guardrails to steer our car down the road when driving, we can go faster and be more effective by steering before tolerances are encountered.
An Exception Plan is an agreed to set of actions to be taken if a tolerance is breached, or more usefully, if a breach of tolerance is forecast. It does not have to be formal; it can just say something along the lines of “we will call a stakeholder meeting”, but typically an exception report with the following information is generated:
What has happened – a brief description of what occurred
Why it happened – some explanation of the events that led to the situation
Options available to us now – options to sort out the issue
Recommended Action – Stakeholders want to hear about solutions, not just problems, so we should include the team’s recommended solution
A request for the stakeholder to decide on the course of action
A review of the tolerance values to see if they need updating.
The main point is to discuss the issue and do something about it. Plan what needs to be done and then follow through to ensure it occurs. Continue tracking the metrics and determine if the actions are working. If similar problems occur again, consider trying something else; evolve, adapt, and overcome.
Why Bother with Tolerances, Why Not Just Adapt?
Why track metrics and have a process for sorting things out if the go off track? It might seem easier to just ask everyone to try their best and let the project run its course. Unfortunately sub-groups rarely evolve into totally symbiotic ecosystems.
Developers tend to want to work on cool technology, Users want every bell and whistle, QA folks want to get things right, and project managers just want to get the thing done. While these opposing forces are at play Tolerances and Exception Reports provide a high visibility, high transparency way of keeping us all honest and pulling in the same direction within the guardrails of project tolerances.
Also, the very act of exploring tolerances with project stakeholders is extremely valuable. Learning where their issues and sensitive areas are really helps with project decision making. So, even if metrics and tolerances are not outwardly graphed, understanding their thresholds is important.
(Update: You can download an Excel spreadsheet with these example graphs below:)