Super Crunchers (2 page)

Read Super Crunchers Online

Authors: Ian Ayres

BOOK: Super Crunchers
13.16Mb size Format: txt, pdf, ePub

Runs Created = (Hits + Walks)
×
Total Bases/(At Bats + Walks)

This equation put much more emphasis on a player's on-base percentage and especially gives higher ratings to those players who tend to walk more often. James's number-crunching approach was particular anathema to scouts. If wine critics like Robert Parker live by their taste and smell, then scouts live and die by their eyes. That's their value added. As described by Lewis:

In the scouts' view, you found big league ball players by driving sixty thousand miles, staying in a hundred crappy motels, and eating God knows how many meals at Denny's all so you could watch 200 high school and college baseball games inside of four months, 199 of which were completely meaningless to you…. [Y]ouwould walk into the ball park, find a seat on the aluminum plank in the fourth row directly behind the catcher and see something no one else had seen—at least no one who knew the meaning of it. You only had to see him once. “If you see it once, it's there.”

The scouts and wine critics like Robert Parker have more in common than simply a penchant for spitting. Just as Parker believes that he can assess the quality of a Château's vintage based on a single taste, baseball scouts believe they can assess the quality of the high school prospect based on a single viewing.

In both contexts, people are trying to predict the market value of untested and immature products, whether they be grapes or baseball players. And in both contexts, the central dispute is whether to rely on observational expertise or quantitative data.

Like wine critics, baseball scouts often resort to non-falsifiable euphemisms, such as “He's a real player,” or “He's a tools guy.”

In
Moneyball,
the conflict between data and traditional expertise came to a head in 2002 when Oakland A's general manager Billy Beane wanted to draft Jeremy Brown. Beane had read James and decided he was going to draft based on hard numbers. Beane loved Jeremy Brown because he had walked more frequently than any other college player. The scouts hated him because he was, well, fat. If he tried to run in corduroys, an A's scout sneered, “he'd start a fire.” The scouts thought there was no way that someone with his body could play major league ball. Beane couldn't care less how a player looked. His drafting mantra was “We're not selling jeans.” Beane just wanted to win games. The scouts seem to be wrong. Brown has progressed faster than anyone else the A's drafted that year. In September 2006, he was called up for his major league debut with the A's and batted .300 (with an on-base percentage of .364).

There are striking parallels between the ways that Ashenfelter and James originally tried to disseminate their number-crunching results. Just like Ashenfelter, James began by placing small ads for his first newsletter,
Baseball Abstracts
(which he euphemistically characterized as a book). In the first year, he sold a total of seventy-five copies. Just as Ashenfelter was locked out of
Wine Spectator,
James was given the cold shoulder by the
Elias Sports Bureau
when he asked to share data.

But James and Ashenfelter have forever left their marks upon their industries. The perennial success of the Oakland A's, detailed in
Moneyball,
and even the first world championship of the Boston Red Sox under the data-driven management of Theo Epstein are tributes to James's lasting influence. The improved weather-driven predictions of even traditional wine writers are silent tributes to Ashenfelter's impact.

Both have even given rise to gearhead groups that revel in their brand of number crunching. James inspired SABR, the Society for American Baseball Research. Baseball number crunching now even has its own name, sabermetrics. In 2006, Ashenfelter in turn helped launch the
Journal of Wine Economics
. There's even an Association of Wine Economists now. Unsurprisingly, Ashenfelter is its first president. By the way, Ashenfelter's first predictions in hindsight are looking pretty darn good. I looked up recent auction prices for Château Latour and sure enough the '89s were selling for more than twice the price of the '86s, and 1990 bottles were priced even higher. Take that, Robert Parker.

In Vino Veritas

This book's central claim is that the rise of number crunching in wine and baseball are not isolated events. In fact, the wine and baseball examples are microcosms of the larger themes of this book. We are in a historic moment of horse-versus-locomotive competition, where intuitive and experiential expertise is losing out time and time again to number crunching. In the old days, many decisions were simply based on some mixture of experience and intuition. Experts were ordained because of their decades of individual trial-and-error experience. We could trust that they knew the best way to do things, because they'd done it hundreds of times in the past. Experiential experts had survived and thrived. If you wanted to know what to do, you'd ask the gray-hairs.

Now something is changing. Business and government professionals are relying more and more on databases to guide their decisions. The story of hedge funds is really the story of a new breed of number crunchers—call them Super Crunchers—who have analyzed large datasets to discover empirical correlations between seemingly unrelated things. Want to hedge a large purchase of euros? Turns out you should sell a carefully balanced portfolio of twenty-six other stocks and commodities that might include Wal-Mart stock.

What is Super Crunching? It is statistical analysis that impacts real-world decisions. Super Crunching predictions usually bring together some combination of size, speed, and scale. The sizes of the datasets are really big—both in the number of observations and in the number of variables. The speed of the analysis is increasing. We often witness the real-time crunching of numbers as the data come hot off the press. And the scale of the impact is sometimes truly huge. This isn't a bunch of egghead academics cranking out provocative journal articles. Super Crunching is done by or for decision makers who are looking for a better way to do things.

And when I say that Super Crunchers are using large datasets, I mean really large. Increasingly business and government datasets are being measured not in mega- or gigabytes but in tera- and even petabytes (1,000 terabytes). A terabyte is the equivalent of 1,000 gigabytes. The prefix
tera
comes from the Greek word for monster. A terabyte is truly a monstrously large quantity. The entire Library of Congress is about twenty terabytes of text. Part of the point of this book is that we need to start getting used to this prefix. Wal-Mart's data warehouse, for example, stores more than 570 terabytes. Google has about four petabytes of storage which it is constantly crunching. Tera mining is not Buck Rogers's fantasy—it's being done right now.

In field after field, “intuitivists” and traditional experts are battling Super Crunchers. In medicine, a raging controversy over what is called “evidence-based medicine” boils down to a question of whether treatment choice will be based on statistical analysis or not. The intuitivists are not giving up without a fight. They claim that a database can never capture clinical expertise nurtured over a lifetime of experience, that a regression can never be as good as an emergency room nurse with twenty years of experience who can tell whether a kid looks “hinky.”

We tend to think that the chess grandmaster Garry Kasparov lost to the Deep Blue computer because of IBM's smarter software. That software is really a gigantic database that ranks the power of different positions. The speed of the computer is important, but in large part it was the computer's ability to access a database of 700,000 grandmaster chess games that was decisive. Kasparov's intuitions lost out to data-based decision making.

Super Crunchers are not just invading and displacing traditional experts; they're changing our lives. They're not just changing the way that decisions are made; they're changing the decisions themselves. Baseball scouts are losing out to gearheads not just because it's a lot cheaper to crunch numbers than to fly scouts out to Palookaville. The scouts are losing because they make poorer predictions. Super Crunchers and experts, of course, don't always disagree. Number crunching sometimes confirms traditional wisdom. The world isn't so perverse that the traditional experts were wrong 100 percent of the time or were even no better than chance. Still, number crunching is leading decision makers to make different and, by and large, better choices.

Statistical analysis in field after field is uncovering hidden relationships among widely disparate kinds of information. If you're a politician and want to know who is most likely to give you a contribution and what form of solicitation is most likely to be successful, you don't need to guess, follow rules of thumb, or trust grizzled traditionalists. Increasingly, it is possible to tease out measurable effects of separate attributes to tell you what kinds of persuasion are likely to work the best. Trolling through databases can reveal underlying causes that traditional experts never even considered.

Data-based decision making is on the rise all around us:

                  Rental car companies and insurers are refusing service to people with poor credit scores because data mining tells them that credit scores correlate with a higher likelihood of having an accident.

                  Nowadays when a flight is canceled, airlines will skip over their frequent fliers and give the next open seat to the mine-identified customer whose continued business is most at risk. Instead of following a first-come, first-serve rule, companies will condition their behavior on literally dozens of consumer-specific factors.

                  The “No Child Left Behind” Act, which requires schools to adopt teaching methods supported by rigorous data analysis, is causing teachers to spend up to 45 percent of class time training kids to pass standardized tests. Super Crunching is even shifting some teachers toward class lessons where every word is scripted and statistically vetted.

Intuitivists beware. This book will detail a dizzying array of Super Crunching stories and introduce you to the people who are making them happen. The number-crunching revolution isn't just about baseball or even sports in general. It is about all the rest of our lives as well. Many times this Super Crunching revolution is a boon to consumers as it helps sellers and governments make better predictions about who needs what. At other times, however, consumers are playing against a statistically stacked deck. Number crunching can put the little guy at a real disadvantage, since sellers can better predict how much they can squeeze out of us.

Steven D. Levitt and Stephen J. Dubner showed in
Freakonomics
dozens of examples of how statistical analysis of databases can reveal the secret levers of causation. Levitt and John Donohue (both my coauthors and friends, about whom you will hear more later) showed that seemingly unrelated events like the abortion rate in 1970 and the crime rate in 1990 have an important connection. Yet
Freakonomics
didn't talk much about the extent to which quantitative analysis is impacting real-world decisions. In contrast, this book is about just that—the impact of number crunching. Decision makers in- and outside of business are using statistical analysis in ways you'd never imagine to drive all kinds of choices.

Other books

Alpha Docs by DANIEL MUÑOZ
Las nieves del Kilimanjaro by Ernest Hemingway
Shelter by Susan Palwick
Ink and Bone by Lisa Unger
Around the Way Girls 9 by Moore, Ms. Michel
The Old Road by Hilaire Belloc
ACougarsDesire by Marisa Chenery
Ivan the Terrible by Isabel de Madariaga
Invisible Murder (Nina Borg #2) by Lene Kaaberbol, Agnete Friis