Upon completion of this lesson, you will be able to:
Estimates are difficult to obtain and are subject to in-group and expert biases. Delphi can be used to overcome some of the effects of bias and result in more accurate estimates. The Delphi Method was developed at the RAND Corporation during the 1950s to provide better estimates for their work for the U.S. Military. It is now widely used in project management and for business forecasting.
The name is derived from the Oracle of Delphi from ancient Greek mythology. It is also known as Estimate-Talk-Estimate or ETE and is a structured communication technique that relies on a group of experts. The technique has also been adapted for use in face-to-face meetings, e.g., Planning Poker which often used in Scrum to provide story point estimates for product backlog items.
Delphi combines the benefits of expert information and analysis with elements of the wisdom of crowds, so, to some extent, it is a variation of “crowd-sourcing”.
The tutorial below provides a summary of the concepts in this lesson with some worked examples. You may wish to view it before or after reading through the lesson.
Slide Deck: Delphi Method | Excel Model: Delphi.xlsx
The Delphi method can be used to provide a subjective estimate of any quantity, for example, a cost estimate, a task or activity time estimate, a probability estimate, a time estimate, a budget estimate, etc. It can be used when there is insufficient historical data and quantitative methods are not feasible, or it can be used to supplement quantitatively derived forecasts.
The technique requires a group – or panel – of (at least three) experts. A facilitator presents some quantify to be estimated and any known background information. Estimates are initially provided anonymously by each expert, either in a meeting or individually. It is critical that the estimators remain anonymous and provide their estimates privately and individually without communicating so as to avoid various biases, such as in-group bias.
After initial estimates have been provided to the facilitator, the panel of experts discusses their estimates in a meeting, and the providers of the highest and lowest estimates are asked to “defend” their findings and explain how they arrived at an “outlying” estimate. Most likely they possess information that is not known to others, or have made assumptions that may or may not be correct and useful. After all, there must be a reason that their estimates are outliers: do they know something that the others don’t know. Information is exchanged in an open forum and discussed, often in a time-boxed manner.
Once the discussion time is over, a second set of estimates is provided – again, anonymously and individually, but this time the estimators can take the new information into account. Likely the estimates will slowly converge.
There are several ways in which Delphi can end – the second estimates are averaged or the Estimate-Talk-Estimate continues for some number of rounds or until there is consensus. Again, it is important that the estimation be done individually and anonymously so that the estimates are not biased.
Each of n estimators provides a first estimate \(E_{i}\) for a set of first estimates
\(E^{(1)} = \{E_{1},E_{2},...,E_{n}\}\).
Estimates \(max(E^{(1)})\) and \(min(E^{(1)})\) are inspected.
Next, each of the n estimators provides a second estimate \(E_{i}^{'}\) for a set of second estimates
\(E^{(2)} = \{E_{1}^{'},E_{2}^{'},...,E_{n}^{'}\}\).
The final estimate is the mean of \(E^{(2)}\):
\(E=\bar{E}^{(2)}=\frac{1}{n}\sum_{i=1}^{n}(E_i^{'})\)
If there is a relatively large number of estimators (more than 15), then the effect of outlying estimates can be mitigated by calculating the median rather than the mean of the second (or later) set of estimates.
each additional round of estimation generally reduces the variance and continually uncovers new information, although at the expense of taking more time.
In a single-band Delphi, each estimator provides a single estimate – their “best estimate” or what they consider “most likely”.
Let’s say we need to estimate the time it takes to bring a new information system online and there’s no prior history as it has not been done before. So, the project manager might decide to ask five experts (some combination of system architects, data engineers, and business architects, among others) to help come up with an estimate.
The project manager sends out a request for an estimate (perhaps via email or some other messaging channel) and provides all known information. Each estimator uses their background, expertise, experience, along with their own research to provide a “best guess” estimate. The table below lists the estimates (in hours).
Estimator | Round I |
---|---|
Sarah | 82 |
Chris | 91 |
Rahul | 35 |
Pedro | 120 |
Walison | 100 |
The project manager then convenes a brainstorming session to discuss the estimates and asks Pedro and Rahul to “defend” their estimates and explain how they came up with estimates that are higher or lower than everyone else’s.
Once information has been exchanged and discussed, the project manager asks them all to go off on their own and come up with another estimate using the new information. The next set of estimates are shown below; the first set of estimates is discarded.
Estimator | Round I | Round II |
---|---|---|
Sarah | 120 | |
Chris | 100 | |
Rahul | 80 | |
Pedro | 125 | |
Walison | 120 |
Once the project manager gets the second set of estimates, they are averaged and the final estimate for the time it takes to do a system cross-over would be estimated at 109 hours.
The estimate for this item can now be used in planning and budgeting. This is most often sufficient, but the estimate does not include any kind of information about how certain or uncertain the estimators are in their estimate or the level of risk that is present in using the estimate. Wide-Band Delphi takes uncertainty into account by asking for a three-point estimate from each expert.
In the first round, each of n estimators provides a tuple of three estimates
\(T_{i}=\{O_{i}, P_{i}, B_{i}\}\),
where O represents an “optimistic” estimate, P represents a “pessimistic” estimate, and B represents a “best guess” estimate.
For each tuple, calculate
\(E_{i}=\frac{O_{i} + P_{i} + 4 B_{i}}{6}\),
resulting in the set
\(E^{(1)} = \{E_{1},E_{2},...,E_{n}\}\)
Estimates \(max(E^{(1)})\) and \(min(E^{(1)})\) are inspected for new information.
Next, each of n estimators provides a second tuple of three estimates
\(T_{i}^{'}=\{O_{i}^{'}, P_{i}^{'}, B_{i}^{'}\}\),
where O’ represents a revised optimistic estimate, P’ represents a revised pessimistic estimate, and B’ represents a revised most likely estimate.
For each revised tuple, calculate
\(E_{i}^{'}=\frac{O_{i}^{'} + P_{i}^{'} + 4 B_{i}^{'}}{6}\),
which will result in the set of second estimates
\(E^{(2)} = \{E_{1}^{'},E_{2}^{'},...,E_{n}^{'}\}\)
The final estimate is the mean of \(E^{(2)}\):
\(E=\bar{E}^{(2)}=\frac{1}{n}\sum_{i=1}^{n}(E_i^{'})\)
Similar to single-band Delphi, if there are many estimators, then we can use the median rather than the mean and go through additional rounds of estimate-talk-estimate. This will generally reduce the variance of the estimates.
Note that rather than calculating a simple average of each estimate tuple, the method calculates a weighted average with a weight of 4 placed on the “best guess”. The weight is based on a beta-distribution of estimates.
The weights can also be adjusted, but note that the denominator must then be adjusted as well: it is always the sum of the weights. A more general expression of a tuple estimate is then:
\(E_{i}=\frac{w_OO_{i} + w_PP_{i} + w_BB_{i}}{w_P+w_O+w_B}\),
where \(w_j\) are the weights for each type of estimate j.
Revisiting the example for single-band Delphi, let’s say that we ask each expert to provide a three-point estimate. The estimate tuples for the first round of estimates are shown below, along with each combined estimate:
Estimator | O | P | B | E |
---|---|---|---|---|
Sarah | 60 | 150 | 82 | 89.7 |
Chris | 75 | 120 | 91 | 93.2 |
Rahul | 30 | 45 | 35 | 35.8 |
Pedro | 80 | 180 | 120 | 123.3 |
Walison | 90 | 120 | 100 | 101.7 |
The combined estimate in column E was calculated with:
\(E_{i}=\frac{O_{i} + P_{i} + 4 B_{i}}{6}\)
Verify them yourself to make sure you know how to calculate them.
After the first round of estimates has been obtained, the facilitator convenes a meeting. Rahul has the lowest estimate and Pedro has the highest, so each of them are asked to share the information they used to derive those estimates.
Using the new information, each expert provides a second tuple of estimates and we calculate the combined estimate again for each tuple.
Estimator | O | P | B | E |
---|---|---|---|---|
Sarah | 80 | 150 | 100 | 105.0 |
Chris | 80 | 150 | 100 | 105.0 |
Rahul | 70 | 120 | 80 | 85.0 |
Pedro | 90 | 150 | 120 | 120.0 |
Walison | 120 | 150 | 130 | 131.7 |
We can either continue with the estimate-talk-estimate cycle or, for expediency, calculate the average of the second set of estimates.
The average estimate is 109.3 hours which is actually quite similar to the single-band Delphi estimate.
Note that some experts revised their estimates only by a little, some by a lot, and some in opposite directions. At the end of the day, they all use the shared information differently.
The variance σ2, is distance between the average optimistic and average pessimistic guesses and is an indicator of the certainty that the panel of experts has about their aggregate estimate.
Given n optimistic estimates \(O = \{O_1,O_2,...,O_n\}\) and n pessimistic estimates \(P = \{P_1,P_2,...,P_n\}\), let
\(\bar{O} = \frac{1}{n}\sum_{i=1}^{n}{O_i}\), and
\(\bar{P} = \frac{1}{n}\sum_{i=1}^{n}{P_i}\)
Then the variance is calculated as:
\(\sigma^2=(\frac{\bar{P} - \bar{O}}{6})^2\)
The standard deviation σ is the square root of the variance:
\(\sigma=\sqrt{\sigma^2}\)
For the example above, the standard deviation would be σ = 9.33:
If the variance is large it indicates that there is a wide range between optimistic and pessimistic guesses which generally implies that the estimators do not have much confidence in their estimates and that information is missing to make more accurate estimates. The user of an estimate that has high variance will need to adjust for the increased risk.
Rather than reporting an estimate as a single number, which is often assumed to be “accurate” and which does not contain any information about the degree of certainty, it is best to provide a range.
The “95% Confidence Interval” can be calculated using the aggregate estimate E and the standard deviation σ in this way:
\(CI=E\pm({1.96\times\sigma})\)
The value 1.96 is the number of standard deviations from the mean to encompass 95% of the normal distribution and is called the “z-score”.
With the example from above, the standard deviation σ = 9.33 and so the 95% confidence interval is from 91 to 128 hours.
Other confidence levels require different values and some of the most important ones are summarized below:
Confidence Level | z-score |
---|---|
99% | 2.58 |
95% | 1.96 |
90% | 1.65 |
80% | 1.28 |
About precision: An estimate or confidence interval that is given at a higher precision is not any more accurate. It doesn’t make much sense to say that the estimate is 109.333 hours rather than reporting 109 hours. It is a “guess” so adding precision does not make the guess any more correct.
We have seen that Delphi is a powerful way to calculate estimates when there is an absence of historical data or the data is not valid. Wide-band Delphi takes “risk” and “confidence” into account.
None collected yet. Let us know.