Authors: Ian Ayres
In the Steven Spielberg movie
Minority Report,
Tom Cruise's character was bombarded with personalized electronic ads that recognized and called out to him as he walked through a mall. For the moment, this is still the stuff of science fiction. But we're getting closer. Now passive identification is coming to the web. PolarRose.com is using face recognition to improve the quality of image searches. Google image searches currently rely on the text that appears near web images. PolarRose, on the other hand, creates 3-D renditions of the faces, codes for ninety different facial attributes, and then matches the image to an ever-growing database. Suddenly, if you happened to be walking by in the background when a tourist snaps a picture, the whole world could learn where you were. Any photo that is subsequently posted to websites like flickr.com could reveal your whereabouts.
Most of the popular discussion of face recognition emphasizes the software that needs to successfully “code” different facial attributes. But make no mistake, facial recognition is Super Crunching looking for high-probability predictions. And once you've been identified, the Super Crunching cat is out of the bag, as increasing numbers of people will be able to determine what library books you forgot to return, which politicians you gave money to, what real estate you own, and countless other bytes of data about you. Walking into H&H and buying a bagel is technically public, but for most of us non-celebrities, public anonymity allowed for a vast range of unobserved freedom of movement. Super Crunching is reducing the sphere of being private in public.
Sherlock Holmes was famous for deducing intricate details about a person's past by observing just a few details of their present. But given access to much more intricate datasets, Super Crunchers can put Holmesian prediction to shame. Given these 250 variables, there's a 93 percent chance that you voted for Nader. It's elementary, my dear Watson.
Data mining does more than give new meaning to “I know what you did last summer.” Super Crunchers can also make astute predictions about what you will do
next
summer. Traditionally, the right to privacy has been about preserving past and present information. There was no need to worry about keeping future information private. Since future information doesn't exist yet, there was nothing to keep private. Yet data-mining predictions raise just this concern. Super Crunching in a sense puts our future privacy at risk because it can probabilistically predict what we will do. Super Crunching moves us toward a kind of statistical predeterminism.
The 1997 sci-fi thriller
Gattaca
imagined a world in which genetics was destiny. The hero's parents were told at his birth that he had a 42 percent chance of manic depression and life expectancy of 34.2 years. But right now it's possible for Super Crunchers to look at a collection of innocuous past behaviors and make chillingly accurate assessments of the future. For example, it's a little unnerving to think that Visa, with a little mining of my credit card charges, can make a pretty accurate guess of whether I'll divorce in the next five years.
“Data huggers”âthe people who are scared about the untoward uses of public dataâhave a lot to be worried about. Google's corporate mission is “to organize the world's information and make it universally accessible and useful.” This ambitious goal is seductively attractive. However, it is not counterbalanced by any concern for privacy. Data-driven predictions are creating new dimensions where our past and even future actions will be “universally accessible.”
The slow erosion of the private sphere makes it harder to realize what is happening and rally resistance. Like a frog slowly boiling to death, we don't notice that our environment is changing. People in Israel now expect to be repeatedly checked by a metal detector as they go about their daily tasks. Sometimes incremental steps of “progress” take us down a path where collectively we end up eating hot-house tomatoes and Wonder Bread that are at best the semblance of food. The fear is that number crunching will somehow similarly degrade our lives.
Newspaper reporters feel compelled to quote privacy pundits who raise concerns and call for debate. Yet most people, when it comes down to it, don't seem to value their privacy very much. The Ponemon Institute estimates that only 7 percent of Americans change their behaviors to preserve their privacy. Rare is the person who will reject the EZ-pass system (and its discount) because it can track car movements. Carnegie-Mellon economist Alessandro Acquisti has found that people are happy to surrender their Social Security number for just a fifty-cents-off coupon. Individually, we're willing to sell our privacy. Sun's founder and CEO Scott McNealy famously declared in 1999 that we “have no privacyâget over it.” Many of us already have.
Super Crunching affects us not only as customers and as employees but also as citizens. I, for one, am not worried about people googling me or predicting my actions. The benefits of indexing and crunching the world's information far outweigh its costs. Other citizens may reasonably disagree. One thing is for certain: consumer pressure by itself is not likely to restrain the Super Crunching onslaught. The data huggers of the world need to unite (and convince Congress) if the excesses of data-based decision making are going to be constrained.
Truth is often a defense. But even true predictions at times may hurt customers and employees if they allow firms to take advantage of us, and predictions can hurt us as citizens if they allow others to inappropriately invade our past, present, or future privacy. The larger concern is about inaccurate (untrue) predictions. Without appropriate protections, they can hurt everybody.
Who Is John Lott?
On September 23, 2002, Mary Rosh posted to the web a rather harsh criticism of an empirical paper that I wrote with my colleague John Donohue. Rosh said:
The Ayres and Donohue piece is a joke. I saw it a while agoâ¦. A friend at the Harvard Law School said that Donohue gave the paper there and he was demolishedâ¦
The article that Rosh was criticizing was about the impact of concealed handgun laws on crime. It was a response to John Lott's “More Guns, Less Crime” claim. Lott created a huge dataset to analyze the impact that concealed weapon laws had on crime. His startling finding was that states which passed laws making it easy for law-abiding citizens to carry concealed weapons experienced a substantial decrease in crime. Lott believed that criminals would be less likely to commit crime if they couldn't be sure whether or not their victims were armed.
Donohue and I took Lott's data and ran thousands of regressions exploring the same issue. Our article refuted Lott's central claim. In fact, we found twice as many states that experienced a statistically significant
increase
in crime after passage of the law. Overall, however, we found that the changes were not substantial, and these concealed weapon laws might not impact crime one way or the other.
That's when Mary Rosh weighed in on the web. Her comment isn't so remarkable for its contentâthat's part of the rough and tumble of academic disputes. The comment is remarkable because Mary Rosh is really John Lott. Mary Rosh was a “sock puppet” pseudonym (based on the first two letters of his four sons' names). Lott as Rosh posted dozens upon dozens of comments to the web praising his own merits and slamming the work of his opponents. Rosh, for example, identified herself as a former student of Lott's and extolled Lott's teaching. “I have to say that he was the best professor that I ever had,” she wrote. “You wouldn't know that he was a âright-wing ideologue' from the class.”
Lott is a complicated and tortured soul. He is often the smartest guy in the room. He comes to seminars and debates consummately prepared. I first met him at the University of Chicago when I was delivering a statistical paper on New Haven bail bondsmen. Lott had not only read my paper carefully, he'd looked up the phone numbers of New Haven bond dealers and called them on the phone. I was blown away.
He is incredibly combative in public, but just as soft-spoken, even meek, when speaking one-on-one. Lott is also a physical presence. He is tall and has striking featuresâIchabod Craneâlike in their lack of proportion. Mary Rosh has even described him:
I had Lott as a teacher about a decade ago, and he has a quite noticable [sic] scar across his forehead. It looked like it cut right through his eyebrows going the entire width of his forehead. [T]he scar was so extremely noticable [sic] that people talked and joked about it. Some students claimed that he had major surgery when he was a child.
Before the Mary Rosh dissembling, I was instrumental in bringing John to Yale Law School for two years as a research fellow. Make no mistake, John Lott has some serious number-crunching skills.
His concealed-weapon empiricism was quickly picked up by gun-rights advocates and politicians as a reason to oppose efforts at gun control and advance the cause of greater freedom to carry guns. In the same year that Lott's initial article was published, Senator Larry Craig (R-Idaho) introduced The Personal Safety and Community Protection Act, which was designed to facilitate the carrying of concealed firearms by nonresidents of a state who had obtained valid permits to carry such weapons in their home state. Senator Craig argued that the work of John Lott showed that arming the citizenry via laws allowing the carrying of concealed handguns would have a protective effect for the community at large because criminals would find themselves in the line of fire.
Lott has repeatedly been asked to testify to state legislatures in support of concealed gun laws. Since Lott's original article was published in 1998, nine additional states have passed his favored statute. This book is about the impact that Super Crunching is having on real-world decisions. It's hard to know for sure whether Lott's regressions were a but-for cause of these new statutes. Still Lott and his “More Guns/Less Crime” regressions have had the kind of influence that most academics can only dream of.
Lott generously made his dataset available not only to Donohue and me, but to anyone who asked. And we dug in, double-checking the calculations and testing whether his results held up if we slightly changed some of his assumptions. Econometricians call this testing to see whether results are “robust.”
We had two big surprises. First, we found that if you made innocuous changes in Lott's regression equation, the crime-reducing impacts of Lott's laws often vanished. More disturbingly, we found that Lott had made a computer mistake in creating some of his underlying data. For example, in many of his regressions, Lott tried to control for whether the crime took place in a particular region (say, the Northeast) in a particular year (say, 1988). But when we looked at his data, many of these variables were mistakenly set to zero. When we estimated his formula on the corrected data, we again found that these laws were more likely to increase the rate of crime.
Let me stress that both of these mistakes are the kind of errors that, but for the grace of God, I or just about any other Super Cruncher might makeâespecially regarding the coding error. There are literally hundreds of data manipulations that need to be made in getting a large dataset in shape to run a regression. If the gearhead makes a mistake on any one of the transformations, the bottom-line predictions may be inaccurate. I have no concern that Lott purposefully miscoded his data to produce predictions that supported his thesis. Nonetheless, it is disturbing that after Donohue and I pointed out the coding errors, Lott and his coauthors continued to rely on the flawed data. As Donohue and I said in a response to our initial article, “repeatedly bringing erroneous data into the public debate starts suggesting a pattern of behavior that is unlikely to engender support for the Lott [âMore Guns/Less Crime'] hypothesis.”
We are not the only ones to engage the topic. More than a dozen different authors have exploited the Lott data to reanalyze the issue. In 2004, the National Academy of Science entered into the debate, conducting a review of the empirical research of firearms and violent crime, including Lott's work. Their panel of experts found: “There is no credible evidence that âright-to-carry' laws, which allow qualified adults to carry concealed handguns, either decrease or increase violent crime.” At least for the moment, this pretty much sums up what many academics feel about the issue.
Lott, however, fights on undaunted. Indeed, John is such a tenacious adversary that I'm a little scared to mention his name here in this book. In 2006, Lott took the extraordinary step of suing Steve Levitt for defamation, growing out of a single paragraph in Levitt's bestselling
Freakonomics
book, which said in part:
Lott's admittedly intriguing hypothesis doesn't seem to be true. When other scholars have tried to replicate his results, they found that right-to-carry laws simply don't bring down crime.
Levitt's endnote supported this claim by citingâ¦you guessed it, my article with Donohue that Mary Rosh thought was a joke. Lott's defamation charge all depends on the meaning of “replicate.” Lott claims that Levitt was suggesting that Lott falsified his resultsâthat he committed the cardinal sin of “editing the output file.” I find it shocking that Lott brought this suit, especially since Donohue and I couldn't replicate some of his results once we corrected Lott's clear coding error (coding errors, by the way, that Lott himself has conceded).
Thankfully, the district court has dismissed the
Freakonomics
claim. Early in 2007, Judge Ruben Castillo found that the term “replicate” was susceptible to non-defaming meanings. The judge pointed to the same Ayres and Donohue endnote, saying that it clarified “the intended definition of the term âreplicate' to be simply that other scholars have disproved Lott's gun theory, not that they proved Lott falsified his data.”
But What If It's Wrong?
The Lott saga has important lessons for Super Crunchers. First, Lott should be applauded for his exemplary sharing of data. Even though Lott's reputation has been severely damaged by the Mary Rosh incident and a host of other concerns, Lott's open-access policy has contributed to a new sharing ethic among data crunchers. I, for one, now share data whenever I legally can. And several journals including my own
Journal of Law, Economics, and Organization
now require data sharing (or an explanation why you can't share). Donohue and I would never have been able to evaluate Lott's work if he had not led the way by giving us the dataset that he worked on.