Authors: Mike Lofgren
The Big Data Fallacy
A minor but telling example of this mind-set was the reaction to a fatal mass shooting at Fort Hood, Texas, on April 2, 2014. Immediately after the incident came the usual discussion of how the tragedy could have been prevented. Chris Poulin, one of the founders of the Durkheim Project, which received $1.8 million from the Defense Department's Defense Advanced Research Projects Agency, was sure he knew. It was all a matter of “finding and processing predictive signals in available data, whether those signals are overt or deeply hidden.” Poulin had published a paper shortly before the incident claiming that suicidal (and possibly homicidal) motivation is predictable on the basis of language, even when suicide is not mentioned. Sounding like a character from the movie
Minority Report,
Poulin went on to say “the technology exists” to predict future events, but “we haven't worked out the civil liberties and the policing procedures to use predictive analytics effectively.”
15
It is indeed the case that in predicting homicides and suicides, just as in predicting terrorist attacks, or in “signature” targeting of potential terrorists for assassination, we haven't worked out the civil liberties issuesânor can we, for reasons linked with the limits of human knowledge and the methods by which we apply our assumed rather than definite knowledge to solve problems. The critical requirement is to eliminate “false positives,” whether in detecting suicide risks, terrorist suspects, or potential military targets. The default solution in all of these cases is more data and more sifting by means of additional search criteria, which not only generates increasingly complex (and more expensive) technological solutions from which contractors obviously benefit, but also drives more data collection. That inevitably means a vicious cycle of more intrusion into personal privacy. It should not surprise us that something as benign sounding as suicide prevention requires a near-Orwellian scale of “big data.”
The biggest flaw in the processing of big data is that it may not work. The algorithms on which many of these programs are based are some variant of Bayes' theorem, an eighteenth-century mathematical formula that assigns probabilities to outcomes based on prior conditions. The theorem works accurately in a classic coin-toss game (where, for example, it is known in advance that two coins are fair and one is a trick coin that always lands on “heads”), because the prior probability is exactly known and the eventual outcome of the coin tosses is beyond dispute. But if fallible human actors (in this case, the government and its contractors) are assigning a known probability to data that is itself incomplete or faulty or cherry-picked, then the prior conditions become skewed. And given the fact that the behavior of the target is incompletely known and capable of change (unlike the coins), the Bayesian algorithm breaks down.
These faulty assumptions are why (besides inevitable coding and translation errors) innocent people end up on no-fly lists.
*
It is also why Hellfire missiles may hit a wedding party rather than an al-Qaeda operative, and why automatic target recognition software might cause a bomb to hit an orphanage rather than a telephone exchange. Because the Deep State is so wedded to Bayesianism as a method of organization and control, the dystopia into which we are heading may look less like
Minority Report
and more like Terry Gilliam's black comedy
Brazil:
a ramshackle authoritarian state where nothing works as intended, and where innocent people are incarcerated and tortured because of silly bureaucratic errors.
Why is the Bayesian approach so seductive? It is not simply that it has become technically easier to implement as computers have grown so powerful. It is significant that it became so prevalent after 9/11. In the six-month period leading up to the terrorist strike, there was plenty of
evidence suggesting an attack was in the offing: “the system was blinking red,” as the 9/11 Commission admitted. Failure to anticipate the attacks was primarily a policy failure of senior officials, including the president, whose behavior during the month prior to the attack verged on willful negligence. According to Robert Nisbet's “no fault” doctrine, failure at the top is rarely if ever admitted. The failures had to be pushed lower down, and so the blame landed on the worker bees in the agencies who allegedly “failed to connect the dots.”
The solution to this imaginary problem not only entailed vast government reorganization and expansion at huge expense (and lucrative opportunities for contractors to get in on the gold rush), it also ensured that something like big data, with a patina of scientific rationality, would come in its wake. Mathematical algorithms provide the deceptive allure of detached objectivity when making judgment calls; the human element is removed and the supposed precision of high technology substituted. Government officials for the most part have no more idea of the limitations of Bayesian algorithms than they do of quantum physics or string theory, so they are easy marks for contractors with ingenious proposals.
One thing these officials do understand is that a portion of government money spent on government contracts gets recycled into campaign contributions, and postgovernment careers beckon. Creepy phrases like Total Information Awareness, the name of a 2003 DOD big data program (administered by Admiral John Poindexter, previously convicted on five counts of lying and obstruction in the Iran-Contra scandal),
*
are also very appealing to those people in government with an authoritarian mind-set and who crave Zeus-like powers. But saying that we can predict the future is as ridiculous as saying the
Titanic
was unsinkable.
Despite these limitations, the NSA's insatiable appetite for collecting data proceeds apace. According to a
Washington Post
study of the 160,000 NSA records of intercepted emails the newspaper obtained from Edward
Snowden, only about 11 percent were communications of targeted individuals; all the rest were third parties with no connection to an investigation. Therefore, in just this small sample of the NSA's prodigious vacuuming of communications, the daily lives of more than 10,000 people were cataloged and recorded. The
Post
described the content as telling “stories of love and heartbreak, illicit sexual liaisons, mental-health crises, political and religious conversions, financial anxieties and disappointed hopes.”
16
The material includes more than 5,000 private photos. All of this personal data results from the incidental collection of information from innocent third parties in the course of the spy agency's search for a targeted individual. Anyone participating in an online chat room visited by a target or even just reading the discussion would be swept up in the data dragnet.
The NSA is not the only player at this game. Thanks to a Freedom of Information Act lawsuit, the Electronic Frontier Foundation discovered that the FBI's “next generation” facial recognition program would have as many as 52 million photographs in it by 2015âincluding millions that were recorded for “non-criminal purposes.” The bureau's massive biometric database already “may hold records on as much as one third of the U.S. population.”
Why is the collection of personal communications so indiscriminate, and why is it that government agencies cannot be more mindful of Americans' privacy? Thomas Drake, a former NSA employee, told me it was because “the NSA is addicted to data.” Consistent with the information revolution, data has become the currency of the government, and particularly of the national security state. “Data is power,” he said. Yet this addiction leads to some perverse results, not least of which is the violation of citizens' Fourth Amendment rights. It also can impede, rather than assist, in the national security state's ostensible mission of “keeping us safe.”
I asked Drake why, if his omniscient former agency really claims to know where the sparrow falls anywhere on earth and in real time, it took the federal government twelve days to inform Target that its systems were
being hacked, and why, seven months later, the perpetrators had not been caught. “The NSA is incentivized to collect data, not to prevent cyberintrusions,” he said. “In any case, they have little incentive to share information with the other agencies. They would prefer to keep the data flowing and collect it rather than interrupt it by triggering a law enforcement action.”
17
So much for keeping us safe.
The Unintended Consequences of Surveillance
In much the same way that the CIA's covert arming of the Afghan Mujahidin in the 1980s created a sharply negative blowback in the form of mutating and evolving terrorist groups, so has the NSA's pervasive surveillance led to negative outcomes. The Center for Strategic and International Studies has estimated the annual cost of cybercrime and economic espionage to the global economy at more than $445 billion: about 1 percent of gross world product.
18
This activity ranges from sophisticated state-backed efforts, like those of China, to extensive private rings of cybercriminals stealing credit card data, to individual hackers. Obviously, cybercrime is a major global problem, and it would exist with or without the NSA. But some unquantifiable portion of that activity may be due to the agency's practice of creating cyberimplants that disable security and encryption software, allowing the agency complete access to a target computer.
William Binney, a former top code-breaker and later whistle-blower at the NSA, told me that these implants are a two-edged sword. He said he has argued for the U.S. government “not to do backdoors or weaken encryption and firewalls or operating systems. The NSA goes around thinking they are the only ones who can see and exploit these weaknesses. This is of course false. What they have done and continue to do is make all of us more vulnerable to hackers and other governments cracking into our data centers and networks. And, by placing more than fifty thousand implants in the worldwide network, they have put in place a potential for others (including governments) first to isolate these implants, then
analyze the code, and in turn discover how to manipulate them. This would mean that others can take over those implants and start to use them as their own. This I have called shortsighted and finite thinking.”
19
It is unclear exactly to what extent these implants and backdoors are something the agency creates on its own and covertly disseminates without the knowledge or consent of the technology companies, and to what extent the NSA coerces the companies either to accept them or develop backdoors of their own for the agency to exploit. A third possibility is a regime of voluntary cooperation between government and industry, with the industry even receiving contracts to weaken the security of their own products. As with other aspects of the Silicon ValleyâNSA relationship, the technology industry is extremely reluctant to acknowledge any collaboration.
Reuters reported in late 2013 that the computer security firm RSA received a cyberimplant from the NSA and distributed it very widely by incorporating it into software ostensibly used to safeguard security in personal computers and other devices. According to the news service, RSA received $10 million to establish an NSA algorithm as the default method for number generation in the bSafe encryption software.
20
This sort of relationship is hardly surprising: as we have already seen, Silicon Valley start-ups routinely received money from the government. The
New York Times
has reported that the CIA pays AT&T $10 million a year for access to data on overseas phone calls.
21
The uproar resulting from Snowden's disclosures led to a predictable sequence of responses by the White House. First, denial that there is a problem: everything that has been done is within the law. Second, when the hullabaloo fails to subside, the president and his spokesmen profess to welcome a vibrant debate by the American people. Finally, when the legislative gears at last begin to turn, the administration offers a proposal that is called reform and whose substance is its antithesis. Such was the legislative solution proposed by the White House in May 2014. Responding to concerns that the NSA was collecting and storing vast amounts of
telecommunications metadata, the administration suggested that the telecoms could maintain the data themselves. The proposal was structured in such a way that the NSA could legally obtain even more metadata than it sweeps up now. As one anonymous intelligence official told the
New York Times,
“It's a pretty good trade. All told, if you are an NSA analyst, you will probably get more of what you wanted to see, even if it's more cumbersome.”
22
Corporations Are Spying Too
Government surveillance is only the tip of the iceberg of what Americans must contend with. During the 1990s, tech moguls and their fan club of technology writers propounded glowing and idealistic predictions about how the Internet would democratize everything it touched and empower everybody. A notable one was Howard Rheingold's 1993 book
The Virtual Community,
in which he described his “utopian vision” of the Internet as an “electronic agora” with “democratizing potential.” Although Rheingold had the good sense to temper that prediction with a warning against corporate commodification of the new information technologies, few of the prognosticators of that time were able to imagine the ability of the Internet, cell phones, and global positioning systems to keep track of citizens' every move, whether at the order of the government or for a tech industry seeking to commercialize all of our private behavior.
As for high technology's democratizing properties, the example of China should throw cold water on that notion. The country has over a half-billion Internet users, but its political leadership has succeeded in controlling the content that Internet subscribers may see and is carefully policing the web for signs of nonapproved opinions. Although it is true that electronic social media may have played a role in the so-called Arab Spring of 2011, the results show that while it may have helped channel discontent, the movements that played a significant role in challenging despotic regimes were not necessarily aiming at democracy in the
traditional Western sense. In any case, the affected governments, like Egypt, were perfectly capable of shutting down nonapproved Internet sites and monitoring cell phone traffic.