Read Data and Goliath Online

Authors: Bruce Schneier

Data and Goliath (21 page)

BOOK: Data and Goliath
7.51Mb size Format: txt, pdf, ePub
ads

We saw this problem with the NSA’s eavesdropping program: the false positives overwhelmed
the system. In the years after 9/11, the NSA passed to the FBI thousands of tips per
month; every one of them turned out to be a false alarm. The cost was enormous, and
ended up frustrating the FBI agents who were obligated to investigate all the tips.
We also saw this with the Suspicious Activity Reports—or SAR—database: tens of thousands
of reports, and no actual results. And all the telephone metadata the NSA collected
led to just one success: the conviction of a taxi driver who sent $8,500 to a Somali
group that posed no direct threat to the US—and that was probably trumped up so the
NSA would have better talking points in front of Congress.

The second problem with using data-mining techniques to try to uncover terrorist plots
is that each attack is unique. Who would have guessed that two pressure-cooker bombs
would be delivered to the Boston Marathon finish line in backpacks by a Boston college
kid and his older brother? Each rare individual who carries out a terrorist attack
will have a disproportionate impact on the criteria used to decide who’s a likely
terrorist, leading to ineffective detection strategies.

The third problem is that the people the NSA is trying to find are wily, and they’re
trying to avoid detection. In the world of personalized marketing, the typical surveillance
subject isn’t trying to hide his activities. That is not true in a police or national
security context. An adversarial relationship makes the problem
much
harder, and means that most commercial
big data analysis tools just don’t work. A commercial tool can simply ignore people
trying to hide and assume benign behavior on the part of everyone else. Government
data-mining techniques can’t do that, because those are the very people they’re looking
for.

Adversaries vary in the sophistication of their ability to avoid surveillance. Most
criminals and terrorists—and political dissidents, sad to say—are pretty unsavvy and
make lots of mistakes. But that’s no justification for data mining; targeted surveillance
could potentially identify them just as well. The question is whether mass surveillance
performs sufficiently better than targeted surveillance to justify its extremely high
costs. Several analyses of all the NSA’s efforts indicate that it does not.

The three problems listed above cannot be fixed. Data mining is simply the wrong tool
for this job, which means that all the mass surveillance required to feed it cannot
be justified. When he was NSA director, General Keith Alexander argued that ubiquitous
surveillance would have enabled the NSA to prevent 9/11. That seems unlikely. He wasn’t
able to prevent the Boston Marathon bombings in 2013, even though one of the bombers
was on the terrorist watch list and both had sloppy social media trails—and this was
after a dozen post-9/11 years of honing techniques. The NSA collected data on the
Tsarnaevs before the bombing, but hadn’t realized that it was more important than
the data they collected on millions of other people.

This point was made in the 9/11 Commission Report. That report described a failure
to “connect the dots,” which proponents of mass surveillance claim requires collection
of more data. But what the report actually said was that the intelligence community
had all the information about the plot
without
mass surveillance, and that the failures were the result of inadequate analysis.

Mass surveillance didn’t catch underwear bomber Umar Farouk Abdulmutallab in 2006,
even though his father had repeatedly warned the US government that he was dangerous.
And the liquid bombers (they’re the reason governments prohibit passengers from bringing
large bottles of liquids, creams, and gels on airplanes in their carry-on luggage)
were captured in 2006 in their London apartment not due to mass surveillance but through
traditional investigative police work. Whenever we learn about an NSA success, it
invariably comes from targeted surveillance
rather than from mass surveillance. One analysis showed that the FBI identifies potential
terrorist plots from reports of suspicious activity, reports of plots, and investigations
of other, unrelated, crimes.

This is a critical point. Ubiquitous surveillance and data mining are not suitable
tools for finding dedicated criminals or terrorists. We taxpayers are wasting billions
on mass-surveillance programs, and not getting the security we’ve been promised. More
importantly, the money we’re wasting on these ineffective surveillance programs is
not being spent on investigation, intelligence, and emergency response: tactics that
have been proven to work.

Mass surveillance and data mining are much more suitable for tasks of population discrimination:
finding people with certain political beliefs, people who are friends with certain
individuals, people who are members of secret societies, and people who attend certain
meetings and rallies. Those are all individuals of interest to a government intent
on social control like China. The reason data mining works to find them is that, like
credit card fraudsters, political dissidents are likely to share a well-defined profile.
Additionally, under authoritarian rule the inevitable false alarms are less of a problem;
charging innocent people with sedition instills fear in the populace.

More than just being ineffective, the NSA’s surveillance efforts have actually made
us less secure. In order to understand how, I need to explain a bit about Internet
security, encryption, and computer vulnerabilities. The following three sections are
short but important.

INTERNET ATTACK VERSUS DEFENSE

In any security situation, there’s a basic arms race between attack and defense. One
side might have an advantage for a while, and then technology changes and gives the
other side an advantage. And then it changes back.

Think about the history of military technology and tactics. In the early 1800s, military
defenders had an advantage; charging a line was much more dangerous than defending
it. Napoleon first figured out how to attack effectively using the weaponry of the
time. By World War I, firearms—particularly the machine gun—had become so powerful
that the defender
again had an advantage; trench warfare was devastating to the attacker. The tide turned
again in World War II with the invention of blitzkrieg warfare, and the attacker again
gained the advantage.

Right now, both on the Internet and with computers in general, the attacker has the
advantage. This is true for several reasons.

•  It’s easier to break things than to fix them.

•  Complexity is the worst enemy of security, and our systems are getting more complex
all the time.

•  The nature of computerized systems makes it easier for the attacker to find one
exploitable vulnerability in a system than for the defender to find and fix
all
vulnerabilities in the system.

•  An attacker can choose a particular attack and concentrate his efforts, whereas
the defender has to defend against every possibility.

•  Software security is generally poor; we simply don’t know how to write secure software
and create secure computer systems. Yes, we keep improving, but we’re still not doing
very well.

•  Computer security is very technical, and it’s easy for average users to get it
wrong and subvert whatever security they might have.

This isn’t to say that Internet security is useless, far from it. Attack might be
easier, but defense is still possible. Good security makes many kinds of attack harder,
more costly, and more risky. Against an attacker who isn’t sufficiently skilled, good
security may protect you completely.

In the security field, we think in terms of risk management. You identify what your
risk is, and what reasonable precautions you should take. So, as someone with a computer
at home, you should have a good antivirus program, turn automatic updates on so your
software is up-to-date, avoid dodgy websites and e-mail attachments from strangers,
and keep good backups. These plus several more essential steps that are fairly easy
to implement will leave you secure enough against common criminals and hackers. On
the other hand, if you’re a political dissident in China, Syria, or Ukraine trying
to avoid arrest or assassination, your precautions must be more comprehensive. Ditto
if you’re a criminal trying to evade the police, a businessman trying to prevent corporate
espionage, or a government embassy trying to thwart military espionage. If you’re
particularly
concerned about corporations collecting your data, you’ll need a different set of
security measures.

For many organizations, security comes down to basic economics. If the cost of security
is less than the likely cost of losses due to lack of security, security wins. If
the cost of security is more than the likely cost of losses, accept the losses. For
individuals, a lot of psychology mixes in with the economics. It’s hard to put a dollar
value on a privacy violation, or on being put on a government watch list. But the
general idea is the same: cost versus benefit.

Of critical import to this analysis is the difference between random and targeted
attacks.

Most criminal attacks are opportunistic. In 2013, hackers broke into the network of
the retailer Target Corporation and stole credit card and other personal information
belonging to 40 million people. It was the biggest known breach of its kind at the
time, and a catastrophe for the company—its CEO, Gregg Steinhafel, resigned over the
incident—but the criminals didn’t specifically pick Target for any ideological reasons.
They were interested in obtaining credit card numbers to commit fraud, and any company’s
database would have done. If Target had had better security, the criminals would have
gone elsewhere. It’s like the typical home burglar. He wants to rob a home. And while
he might have some selection criteria as to neighborhood and home type, he doesn’t
particularly care which one he chooses. Your job as a homeowner is to make your home
less attractive to the burglar than your neighbor’s home. Against undirected attacks,
what entails good security is relative.

Compare this with the 2012 attack against the
New York Times
by Chinese hackers, possibly ones associated with the government. In this case, the
attackers were trying to monitor reporters’ communications with Chinese dissidents.
They specifically targeted the
New York Times’
e-mails and internal network because that’s where the information they wanted was
located. Against targeted attacks, what matters is your absolute level of security.
It is irrelevant what kind of security your neighbors have; you need to be secure
against the specific capabilities of your attackers.

Another example: Google scans the e-mail of all Gmail users, and uses information
gleaned from it to target advertising. Of course, there isn’t a Google
employee doing this; a computer does it automatically. So if you write your e-mail
in some obscure language that Google doesn’t automatically translate, you’ll be secure
against Google’s algorithms—because it’s not worth it to Google to manually translate
your e-mails. But if you’re suddenly under targeted investigation by the FBI, officers
will take that time and translate your e-mails.

Keep this security distinction between mass and targeted surveillance in mind; we’ll
return to it again and again.

THE VALUE OF ENCRYPTION

I just described Internet security as an arms race, with the attacker having an advantage
over the defender. The advantage might be major, but it’s still an advantage of degree.
It’s never the case that one side has some technology so powerful that the other side
can’t possibly win—except in movies and comic books.

Encryption, and cryptography in general, is the one exception to this. Not only is
defense easier than attack; defense is so much easier than attack that attack is basically
impossible.

There’s an enormous inherent mathematical advantage in encrypting versus trying to
break encryption. Fundamentally, security is based on the length of the key; a small
change in key length results in an enormous amount of extra work for the attacker.
The difficulty increases exponentially. A 64-bit key might take an attacker a day
to break. A 65-bit key would take the same attacker twice the amount of time to break,
or two days. And a 128-bit key—which is at most twice the work to use for encryption—would
take the same attacker 2
64
times longer, or one million billion years to break. (For comparison, Earth is 4.5
billion years old.)

This is why you hear statements like “This can’t be broken before the heat death of
the universe, even if you assume the attacker builds a giant computer using all the
atoms of the planet.” The weird thing is that those are not exaggerations. They’re
just effects of the mathematical imbalance between encrypting and breaking.

At least, that’s the theory. The problem is that encryption is just a bunch of math,
and math has no agency. To turn that encryption math into something that can actually
provide some security for you, it has to be
written in computer code. And that code needs to run on a computer: one with hardware,
an operating system, and other software. And that computer needs to be operated by
a person and be on a network. All of those things will invariably introduce vulnerabilities
that undermine the perfection of the mathematics, and put us back in the security
situation discussed earlier—one that is strongly biased towards attack.

The NSA certainly has some classified mathematics and massive computation capabilities
that let it break some types of encryption more easily. It built the Multiprogram
Research Facility in Oak Ridge, Tennessee, for this purpose. But advanced as the agency’s
cryptanalytic capabilities are, we’ve learned from Snowden’s documents that it largely
uses those other vulnerabilities—in computers, people, and networks—to circumvent
encryption rather than tackling it head-on. The NSA hacks systems, just as Internet
criminals do. It has its Tailored Access Operations group break into networks and
steal keys. It exploits bad user-chosen passwords, and default or weak keys. It obtains
court orders and demands copies of encryption keys. It secretly inserts weaknesses
into products and standards.

BOOK: Data and Goliath
7.51Mb size Format: txt, pdf, ePub
ads

Other books

The Seeker by Ann H. Gabhart
Prime Obsession by Monette Michaels
Emily Goes to Exeter by M. C. Beaton
Ablaze by Tierney O'Malley
The Stranger by K. A. Applegate
The Illustrated Man by Ray Bradbury
The Ladder Dancer by Roz Southey