Everything Is Obvious (22 page)

Read Everything Is Obvious Online

Authors: Duncan J. Watts

BOOK: Everything Is Obvious
6.89Mb size Format: txt, pdf, ePub

In practice, however, prediction markets are more complicated than the theory suggests. In the 2008 presidential election, for example, one of the most popular prediction markets, Intrade, experienced a series of strange fluctuations when an unknown trader started placing very large bets on John McCain, generating large spikes in the market’s prediction for a McCain victory. Nobody figured out who was behind these bets, but the suspicion was that it was a McCain supporter or even a member of the campaign. By manipulating the market prices, he or she was trying to create the impression that a respected source of election forecasts was calling the election for McCain, presumably with the hope of creating a self-fulfilling prophecy. It didn’t work. The spikes were quickly reversed by other traders, and the mystery bettor ended up losing money; thus the market functioned essentially as it was supposed to. Nevertheless, it exposed a potential vulnerability of the theory, which assumes that rational traders will not deliberately lose money. The problem is that if the goal of a participant is instead to manipulate perceptions of people
outside
the market (like the media) and if the amounts involved are relatively small (tens of thousands of dollars, say, compared with the tens of millions of dollars spent on TV advertising), then they may not care about losing money, in which case it’s no longer clear what signal the market is sending.
4

Problems like this one have led some skeptics to claim that prediction markets are not necessarily superior to other less sophisticated methods, such as opinion polls, that are harder to manipulate in practice. However, little attention has been
paid to evaluating the relative performance of different methods, so nobody really knows for sure.
5
To try to settle the matter, my colleagues at Yahoo! Research and I conducted a systematic comparison of several different prediction methods, where the predictions in question were the outcomes of NFL football games. To begin with, for each of the fourteen to sixteen games taking place each weekend over the course of the 2008 season, we conducted a poll in which we asked respondents to state the probability that the home team would win as well as their confidence in their prediction. We also collected similar data from the website Probability Sports, an online contest where participants can win cash prizes by predicting the outcomes of sporting events. Next, we compared the performance of these two polls with the Vegas sports betting market—one of the oldest and most popular betting markets in the world—as well as with another prediction market, TradeSports. And finally, we compared the prediction of both the markets and the polls against two simple statistical models. The first model relied only on the historical probability that home teams win—which they do 58 percent of the time—while the second model also factored in the recent win-loss records of the two teams in question. In this way, we set up a six-way comparison between different prediction methods—two statistical models, two markets, and two polls.
6

Given how different these methods were, what we found was surprising: All of them performed about the same. To be fair, the two prediction markets performed a little better than the other methods, which is consistent with the theoretical argument above. But the very best performing method—the Las Vegas Market—was only about 3 percentage points more accurate than the worst-performing method, which was the model that always predicted the home team would win with
58 percent probability. All the other methods were somewhere in between. In fact, the model that also included recent win-loss records was so close to the Vegas market that if you used both methods to predict the actual point differences between the teams, the average error in their predictions would differ by less than a tenth of a point. Now, if you’re betting on the outcomes of hundreds or thousands of games, these tiny differences may still be the difference between making and losing money. At the same time, however, it’s surprising that the aggregated wisdom of thousands of market participants, who collectively devote countless hours to analyzing upcoming games for any shred of useful information, is only incrementally better than a simple statistical model that relies only on historical averages.

When we first told some prediction market researchers about this result, their reaction was that it must reflect some special feature of football. The NFL, they argued, has lots of rules like salary caps and draft picks that help to keep teams as equal as possible. And football, of course, is a game where the result can be decided by tiny random acts, like the wide receiver dragging in the quarterback’s desperate pass with his fingertips as he runs full tilt across the goal line to win the game in its closing seconds. Football games, in other words, have a lot of randomness built into them—arguably, in fact, that’s what makes them exciting. Perhaps it’s not so surprising after all, then, that all the information and analysis that is generated by the small army of football pundits who bombard fans with predictions every week is not superhelpful (although it might be surprising to the pundits). In order to be persuaded, our colleagues insisted, we would have to find the same result in some other domain for which the signal-to-noise ratio might be considerably higher than it is in the specific case of football.

OK, what about baseball? Baseball fans pride themselves on their near-fanatical attention to every measurable detail of the game, from batting averages to pitching rotations. Indeed, an entire field of research called sabermetrics has developed specifically for the purpose of analyzing baseball statistics, even spawning its own journal, the
Baseball Research Journal
. One might think, therefore, that prediction markets, with their far greater capacity to factor in different sorts of information, would outperform simplistic statistical models by a much wider margin for baseball than they do for football. But that turns out not to be true either. We compared the predictions of the Las Vegas sports betting markets over nearly twenty thousand Major League baseball games played from 1999 to 2006 with a simple statistical model based again on home-team advantage and the recent win-loss records of the two teams. This time, the difference between the two was even smaller—in fact, the performance of the market and the model were indistinguishable. In spite of all the statistics and analysis, in other words, and in spite of the absence of meaningful salary caps in baseball and the resulting concentration of superstar players on teams like the New York Yankees and Boston Red Sox, the outcomes of baseball games are even closer to random events than football games.

Since then, we have either found or learned about the same kind of result for other kinds of events that prediction markets have been used to predict, from the opening weekend box office revenues for feature films to the outcomes of presidential elections. Unlike sports, these events occur without any of the rules or conditions that are designed to make sports competitive. There is also a lot of relevant information that prediction markets could conceivably exploit to boost their performance well beyond that of a simple model or a poll of relatively uninformed individuals. Yet when we compared
the Hollywood Stock Exchange (HSX)—one of the most popular prediction markets, which has a reputation for accurate prediction—with a simple statistical model, the HSX did only slightly better.
7
And in a separate study of the outcomes of five US presidential elections from 1988 to 2004, political scientists Robert Erikson and Christopher Wlezien found that a simple statistical correction of ordinary opinion polls outperformed even the vaunted Iowa Electronic Markets.
8

TRUST NO ONE, ESPECIALLY YOURSELF

So what’s going on here? We are not really sure, but our suspicion is that the strikingly similar performance of different methods is an unexpected side effect of the prediction puzzle from the previous chapter. On the one hand, when it comes to complex systems—whether they involve sporting matches, elections, or movie audiences—there are strict limits to how accurately we can predict what will happen. But on the other hand, it seems that one can get pretty close to the limit of what
is
possible with relatively simple methods. By analogy, if you’re handed a weighted die, you might be able to figure out which sides will come up more frequently in a few dozen rolls, after which you would do well to bet on those outcomes. But beyond that, more elaborate methods like studying the die under a microscope to map out all the tiny fissures and irregularities on its surface, or building a complex computer simulation, aren’t going to help you much in improving your prediction.

In the same way, we found that with football games a single piece of information—that the home team wins slightly more than half the time—is enough to boost one’s performance in predicting the outcome above random guessing. In addition, a second simple insight, that the team with the better win-loss
record should have a slight advantage, gives you another significant boost. Beyond that, however, all the additional information you might consider gathering—the recent performance of the quarterback, the injuries on the team, the girlfriend troubles of the star running back—will only improve your predictions incrementally at best. Predictions about complex systems, in other words, are highly subject to the law of diminishing returns: The first pieces of information help a lot, but very quickly you exhaust whatever potential for improvement exists.

Of course, there are circumstances in which we may care about very small improvements in prediction accuracy. In online advertising or high-frequency stock trading, for example, one might be making millions or even billions of predictions every day, and large sums of money may be at stake. Under these circumstances, it’s probably worth the effort and expense to invest in sophisticated methods that can exploit the subtlest patterns. But in just about any other business, from making movies or publishing books to developing new technologies, where you get to make only dozens or at most hundreds of predictions a year, and where the predictions you are making are usually just one aspect of your overall decision-making process, you can probably predict about as well as possible with the help of a relatively simple method.

The one method you don’t want to use when making predictions is to rely on a single person’s opinion—especially not your own. The reason is that although humans are generally good at perceiving which factors are potentially relevant to a particular problem, they are generally bad at estimating how important one factor is relative to another. In predicting the opening weekend box office revenue for a movie, for example, you might think that variables such as the movie’s production and marketing
budgets, the number of screens on which it will open, and advance ratings by reviewers are all highly relevant—and you’d be correct. But how much should you weight a slightly worse-than-average review against an extra $10 million marketing budget? It isn’t clear. Nor is it clear, when deciding how to allocate a marketing budget, how much people will be influenced by the ads they see online or in a magazine versus what they hear about the product from their friends—even though all these factors are likely to be relevant.

You might think that making these sorts of judgments accurately is what experts would be good at, but as Tetlock showed in his experiment, experts are just as bad at making quantitative predictions as nonexperts and maybe even worse.
9
The real problem with relying on experts, however, is not that they are appreciably worse than nonexperts, but rather that because they are experts we tend to consult only one at a time. Instead, what we should do is poll
many
individual opinions—whether experts or not—and take the average. Precisely how you do this, it turns out, may not matter so much. With all their fancy bells and whistles, prediction markets may produce slightly better predictions than a simple method like a poll, but the difference between the two is much less important than the gain from simply averaging lots of opinions
somehow
. Alternatively, one can estimate the relative importance of the various predictors directly from historical data, which is really all a statistical model accomplishes. And once again, although a fancy model may work slightly better than a simple model, the difference is small relative to using no model at all.
10
At the end of the day, both models and crowds accomplish the same objective. First, they rely on some version of human judgment to identify which factors are relevant to the prediction in question. And second,
they estimate and weight the relative importance of each of these factors. As the psychologist Robyn Dawes once pointed out, “the whole trick is to know what variables to look at and then know how to add.”
11

By applying this trick consistently, one can also learn over time which predictions can be made with relatively low error, and which cannot be. All else being equal, for example, the further in advance you predict the outcome of an event, the larger your error will be. It is simply harder to predict the box office potential of a movie at green light stage than a week or two before its release, no matter what methods you use. In the same way, predictions about new product sales, say, are likely to be less accurate than predictions about the sales of existing products no matter when you make them. There’s nothing you can do about that, but what you can do is start using any one of several different methods—or even use all of them together, as we did in our study of prediction markets—and keep track of their performance over time. As I mentioned at the beginning of the previous chapter, keeping track of our predictions is not something that comes naturally to us: We make lots of predictions, but rarely check back to see how often we got them right. But keeping track of performance is possibly the most important activity of all—because only then can you learn how accurately it is possible to predict, and therefore how much weight you should put on the predictions you make.
12

Other books

Un millón de muertos by José María Gironella
Aretha Franklin by Mark Bego
Upright Beasts by Lincoln Michel
Grime by K.H. Leigh
EarthRise by William C. Dietz
Skein of the Crime by Sefton, Maggie
Spellbinder by C. C. Hunter
Good Heavens by Margaret A. Graham
Stolen Luck by Megan Atwood