Authors: Ian Ayres
The matching sites, meanwhile, are starting to compete on validating their claims. True.com emphasizes that it is the only site which had its methodology certified by an independent auditor. True.com's chief psychologist James Houran is particularly dismissive of eHarmony's data claims. “I've seen no evidence they even conducted any study that forms the basis of their test,” Houran says. “If you're touting that you're doing something scientificâ¦you inform the academic community.”
eHarmony is responding by providing some evidence that their matching system works. It sponsored a Harris poll suggesting that eHarmony is now producing about ninety marriages a day (that's over 30,000 a year). This is better than nothing, but it's only a modest success because with more than five million members, these marriages represent about only a 1 percent chance that your $50 fee will produce a walk down the aisle. The competitors are quick to dismiss the marriage number. Yahoo!'s Thompson has said you have a better chance of finding your future spouse if you “go hang out at the Safeway.”
eHarmony also claims that it has evidence that its married couples are in fact more compatible. Its researchers presented last year to the American Psychological Society their finding that married couples who found each other through eHarmony were significantly happier than couples married for a similar length of time who met by other means. There are some serious weaknesses with this study, but the big news for me is that the major matching sites are not just Super Crunching to develop their algorithms; they're Super Crunching to prove that their algorithms got it right.
The matching algorithms of these services aren't, however, completely data-driven. All the services rely at least partially on the conscious preferences of their clients (regardless of whether these preferences are valid predictors of compatibility). eHarmony allows clients to discriminate on the race of potential mates. Even though it's only acting on the wishes of its clients, matching services that discriminate by race may violate a statute dating back to the Civil War that prohibits race discrimination in contracting. Think about it. eHarmony is a for-profit company that takes $50 from black clients and refuses to treat them the same (match them with the same people) as some white clients. A restaurant would be in a lot of trouble if it refused to seat Hispanic customers in a section where customers had stated a preference to have “Anglos only.”
eHarmony has gotten into even more trouble for its refusal to match same-sex couples. The founder's wife and senior vice president, Marylyn Warren, has claimed that “eHarmony is meant for everybody. We do not discriminate in any way.” This is clearly false. They would refuse to match two men even if, based on their answers to the company's 436 questions, the computer algorithm picked them to be the most compatible. There's a sad irony here. eHarmony, unlike its competitors, insists that similar people are the best matches. When it comes to gender, it insists that opposites attract. Out of the top ten matching sites, eHarmony is the only one that doesn't offer same-sex matching.
Why is eHarmony so out of step? Its refusal to match gay and lesbian clients, even in Massachusetts where same-sex marriage is legal, seems counter to the company's professed goal of helping people find lasting and satisfying marriage partners. Warren is a self-described “passionate Christian” who for years worked closely with James Dobson's Focus on the Family. eHarmony is only willing to facilitate certain types of legal marriages regardless of what the statistical algorithm says. In fact, because the algorithm is not public, it is possible that eHarmony puts a normative finger on the scale to favor certain clients.
But the big idea behind these new matching servicesâthe insight they all shareâis that data-based decision making doesn't need to be limited to the conscious preferences of the masses. Instead, it is possible to study the results of decisions and tease out from inside the data the factors that lead to success. This chapter is about how simple regressions are changing decisions by improving predictions. By sifting through aggregations of data, the regression technique can uncover the levers of causation that are hidden to casual and even expert observation. And even when experts feel that a particular factor is an important determinant of some outcome, the regression technique literally can price it out.
Just for fun, Garth Sundem, in his book
Geek Logik,
used a regression to create a formula to predict how long celebrity marriages will last. (It turns out that having more Google hits reduces a marriage's chancesâespecially if the top Google hits include sexually suggestive photos!) eHarmony, Perfectmatch, and True.com are doing the same kind of thing, but they're doing it for profit. These services are engaged in a new kind of Super Crunching competition. The game's afoot and it's a very different kind of game.
Harrah's Feels Your Pain
The same kind of statistical matchmaking is also happening inside companies like Lowe's and Circuit City, which are using Super Crunching to select job applicants. Employers want to predict which job applicants are going to make a commitment to their job. Unlike traditional aptitude tests that try to suss out an applicant's IQ, the modern tests are much closer to eHarmony's questionnaire in trying to evaluate three underlying personality traits of the applicants: their conscientiousness, agreeableness, and extroversion. Data mining shows that these personality traits are better predictors of worker productivity (especially turnover) than more traditional ability testing. Barbara Ehrenreich was appalled when she took an employment test at a Minneapolis Wal-Mart and was told that she had given the wrong answer when she agreed with the proposition “there is room in every corporation for a non-conformist.” Yet regressions suggest that people who think Wal-Mart is for non-conformists aren't a good fit and are more likely to turn over. It's one thing to argue that Wal-Mart and other employers should reorganize their mind-numbing jobs to make them less boring. But in a world where mind-numbing jobs are legal, it's hard for me to see what's wrong with a statistically validated test that helps match employees that are most compatible with those jobs.
Mining for non-obvious predictors is not just about hiring good applicants. It's also helping businesses keep their costs down, especially the costs of stagnant inventory. Businesses that can do a better job of predicting demand can do a better job of predicting when they are about to run out of something. And it can be just as important for businesses to know when they're
not
about to run out of something. Instead of bearing the costs of large inventories lying around, Super Crunching allows firms to move to just-in-time purchasing. Stores like Wal-Mart and Target try to get as close as possible to having no excess inventory on hand at all. “What they have on the shelf is what they've got,” said Scott Gnau, general manager of the data-mining company Teradata. “If I buy six cans of yellow corn off the shelf, and there are now three cans left, somebody knows that happened immediately so they can make sure that the truck coming my way gets some more corn loaded on it. It's gotten to the point that as you're putting stuff in your trunk, the store is loading the truck at the distribution center.” These prediction strategies can be based on highly specific details about likely demand. Before Hurricane Ivan hit Florida in 2004, Wal-Mart already had started rushing strawberry Pop-Tarts to stores in the hurricane's path. Analyzing sales of other stores in areas hit by hurricanes, Wal-Mart was able to predict that people would be yearning for the gooey comfort of Pop-Tarts, finger food that doesn't require cooking or refrigeration. Firms are engaging in “analytic competition” in an explicit attempt to out-data-mine the other guy, struggling to first uncover and then exploit the hidden determinants of profitability.
Some of this Super Crunching is done in-house, but truly large datasets are warehoused and analyzed by specialist firms like Teradata, which manages literally terabytes of data. Sixty-five percent of the top worldwide retailers (including Wal-Mart and JCPenney) use Teradata. More than 70 percent of airlines and 40 percent of banks are its customers.
Crunching terabytes helps predict which customers are likely to defect to rivals. For its most profitable customers, Continental Airlines keeps track of every negative experience that may increase the chance of defection. The next time a customer who experienced a bad flight takes to the air, a data-mining program automatically kicks in and gives the crew a heads-up. Kelly Cook, Continental's onetime director of customer relationship management, explains, “Recently, a flight attendant walked up to a customer flying from Dallas to Houston and said, âWhat would you like to drink? And, oh, by the way, I am so sorry we lost your bag yesterday coming from Chicago.' The customer flipped.”
UPS uses a more sophisticated algorithm to predict when a customer is likely to switch to another shipping company. The same kind of regression formula that we saw at play with wines and matchmaking is used to predict when a customer's loyalty is at risk, and UPS kicks into action even before the customer has thought about switching. A salesperson proactively calls the customer to reestablish the relationship and resolve potential problems, dramatically reducing the loss of accounts.
Harrah's casinos are particularly sophisticated at predicting how much money they can extract from clients and still retain their business. Harrah's “Total Rewards” customers use a swipeable electronic card that lets Harrah's capture information on every game played at every Harrah's casino they've visited. Harrah's knows in real time on a hand-by-hand (or slot-by-slot) basis how much each player is winning or losing. It combines these gambling data together with other information such as the customer's age and the average income in the area where he or she lives, all in a data warehouse.
Harrah's uses this information to predict how much a particular gambler can lose and still enjoy the experience enough to come back for more. It calls this magic number the “pain point.” And once again, the pain point is calculated by plugging customer attributes into a regression formula. Given that Shelly, who likes to play the slots, is a thirty-four-year-old white female from an upper-middle-class neighborhood, the system might predict her pain point for an evening of gambling is a $900 loss. As she gambles, if the database senses that Shelly is approaching $900 in slot losses, a “luck ambassador” is dispatched to pull her away from the machine.
“You come in, swipe your card, and are sitting at a slot,” Teradata's Gnau said. “When you get close to that pain point, they come out and say, âI see you're having a rough day. I know you like our steakhouse. Here, I'd like you to take your wife to dinner on us right now.' So it's no longer pain. It becomes a good experience.”
To some, this kind of manipulation is the science of diabolically separating as many dollars from a customer as possible on a repeated basis. To others, it is the science of improving customer satisfaction and loyaltyâand of making sure the right customers get rewarded. It's actually a bit of both. I'm troubled that Harrah's is making what can be an addictive and ruinous experience even more pleasurable. But because of Harrah's pain-point predictions, its customers tend to leave happier.
The Harrah's strategy of targeting benefits is being adopted in different retail markets. Teradata found, for example, that one of its airline clients was giving perks to its frequent fliers based solely on how many miles they flew each year, with Platinum customers getting the most benefits. But the airline hadn't taken account of how profitable these customers were. They didn't plug in other available information, such as how much Platinum fliers paid for tickets, where they bought them, whether they called customer service, and most important, whether they traveled on flights where the airline actually made money. After Teradata crunched the numbers taking into account these bottom-line attributes, the airline found out that almost all of its Platinum fliers were unprofitable. Teradata's Scott Gnau summed it up, “So they were giving people an incentive to make them lose money.”
The advent of tera mining means that the era of the free lunch is over. Instead of having more profitable customers subsidizing the less profitable, firms will be able to target rewards to their most profitable customers. But
caveat emptor
! In this brave new world, you should be scared when a firm like Harrah's or Continental becomes particularly solicitous of your business. It probably means you have been paying too much. Airlines are learning to give upgrades and other favorable treatment to the customers that make them the most money, not just the ones that fly the most. Airlines can then “encourage people to become more profitable,” Gnau explains, by charging you more, for example, for buying tickets through a call center than for buying them online.
This hyper-individualized segmentation of consumers also lets firms offer new personalized services that clearly benefit society. Progressive insurance capitalizes on the new capabilities of data mining to define extremely narrow groups of customers, e.g., motorcycle riders ages thirty and above, with college educations, credit scores over a certain level, and no accidents. For each cell, the company runs regressions to identify factors that most closely correlate with that group's losses. Super Crunching on this radically expanded set of factors lets it set prices on types of consumers who were traditionally written off as uninsurable.
Super Crunching has also created a new science of extraction. Data mining increases firms' ability to charge individualized prices that predict our individualized pain points. If your walk-away price is higher than mine, tera mining will lead firms to take a bigger chunk out of you one way or another. In a Super Crunching world, consumers can't afford to be asleep at the wheel. It's no longer safe to rely on the fact that other consumers care about price. Firms are figuring out more and more sophisticated ways to treat the price-oblivious differently than the price-conscious.