Our previous Inc. post discussed how to decompose conflicting policy advice into two separate components: differences about facts versus differences about values. Let's define a fact as a statement that is true or false independent of what we think or want, and subject to empirical validation. It is also useful to distinguish between existing facts (which historians record, as well as scientists) and future facts, like tomorrow's temperature in location X or the price of oil a year from now.
Clearly there will be uncertainty about future facts but this can also happen with existing facts, such as the depth of the ocean in location Y yesterday, or all the facts that lawyers try to discover in lawsuits. Whenever the facts-- past, present or future-- are uncertain due to our limited knowledge, experts may differ in their estimates. That's when the question of how to integrate or weigh these varying expert opinions about facts becomes relevant.
After having broken Humpty Dumpty into facts vs. values, the question remains: How do you recombine the components? First, this should be done only after you have thoroughly examined, tested and reconciled the reasons why the experts differ over either factual matters or value issues.
Second, when decision time arrives, you may need to handle any remaining factual disputes differently from value differences that still persist. Debates about facts can, in principle, be settled empirically (even if time and cost may not allow that to be done easily) and as such they fall in the realm of science. Disputes about values, such as how important one objective or goal is over another, can be debated endlessly and may ultimately fall in the realm of moral philosophy.
So, let's talk about the facts, or more precisely, about how to aggregate divergent expert opinions about factual matters, such as predictions about a future election or the price of oil next year. Ideally, one's political or moral views should not influence such estimates, just as the chance of rain-- or your favorite sports team winning its next match-- should rationally be independent of your own hopes, desires and values.
Aggregating Estimates About Factual Matters
For factual disputes, a weighted-averaging process makes sense unless it can clearly be shown that one group member has a superior perspective and knowledge base. This person would presumably have influenced the views of others already in the group discussion, so taking a group average will already capture some of this, but with dilution.
Consider, for example, an exercise we did with senior executives from a large pharma company. We showed the 30 people a blind map of Europe with just the contours of the countries drawn, but without city and river markings, or any names listed. This map was then superimposed on a grid that showed, in the background, a 0-100 coordinate system so that each person could individually estimate the location of a country's capital. For example, the center of London might be guessed to be at point (23, 52) whereas Vienna would likely have a lower vertical score (X) but a larger horizontal (Y) value.
We used this exercise to demonstrate the wisdom-of-the-crowd effect and indeed, when we averaged the group's estimates for each city's location, those guesses were closer to the true location of each capital than nearly all individual estimates. But there was one person, named James, who actually beat the wisdom of the crowd. Interestingly, just before starting this exercise, several participants commented that James would probably do best at this quiz since he was known for his geographic acumen, having traveled the globe widely with gusto.
This example illustrates a conundrum of the averaging approach. We know that it often beats most individuals, whose judgments contributed to the average, but not necessarily all. Unfortunately, we may not know ahead of time who the hidden experts in our midst are and thus it becomes risky to bet on a single presumed genius unless the evidence is overwhelming. Also, the best expert must be able to express this superior knowledge in terms of precise numerical estimates, as in our geography quiz, before you can take those subjective expert estimates to the bank. Several good books, including the Wisdom of the Crowd, The Signal and the Noise and Difference, delve more deeply into conditions favoring group judgments vs. betting on presumed experts.
Much hinges here on the distribution of knowledge in the group, as well as whether the random sources of noise in people's judgments are largely uncorrelated (in which case averaging really helps) or are highly correlated (in which case the group average may be badly biased). It also matters, in case group members broadly agree on some key estimate, how much they relied on similar vs. different sources. Insofar as they drew on different types of evidence and independently arrived at a similar conclusion, it tends to pay to "extremize" the group average. Furthermore, the averaging process need not be based on equal weighing but might from the start give more weight to some members, in essence handicapping people's predictive acumen, a priori.
A Numerical Aggregation Example
When differentially weighing expert opinions, one should take into account the degree of correlations among their judgments. There are a number of aggregation models available for combining expert opinions. We can't review them here but the following example conveys the flavor:
Suppose a group of experts has to predict the price of a barrel of Brent Spar oil exactly one year from today. After some group discussion, you might ask each expert for a best guess as well as a confidence range around that estimate. For example, suppose that experts A, B and C provide the following mean guesses for next year's oil price, with the ranges expressed as standard deviations (SD) shown in parentheses: $60 (6), $62 (5) and $70 (7). The smaller the standard deviation, as subjectively provided by each expert, the more confident the expert is in his or her prediction. The model we use here assumes that the experts are well-calibrated and not overconfident.
An equal-weighing method, ignoring confidence levels for now, would place the group average at (60+62+70)/3 = $64. If we differentially weigh each expert based on their SDs, the weights would become 32%, 45% and 23%, yielding a weighted average of $63. If we also want to take into account the degree to which the experts have overlapping worldviews, we need to assess the statistical dependencies among their opinions. Suppose that the correlation between the predictions of A and B has been .6 in the past for oil prices estimates, .5 between A and C and .6 between B and C. If so, the Winkler aggregation model would then recommend weights of 26%, 67% and 7% respectively. Note that expert C now counts much less due to having the widest SD score (of 7) plus high correlations with the other two experts. In essence, expert's C incremental contribution--once we know what A and B think-- is rather small.
The overall effect of incorporating these dependencies among our three experts' view is that the group estimate for next year's oil price is now just $62, due to being pulled closer to A and B. These kind of refinements in weights may not matter at all times, but at least there are solid methods supporting them. --with Phil Tetlock, professor of psychology, University of Pennsylvania