The Information (46 page)

Read The Information Online

Authors: James Gleick

Tags: #Non-Fiction

BOOK: The Information
4.37Mb size Format: txt, pdf, ePub

Dear Drs. Watson & Crick,

 

I am a physicist, not a biologist.… But I am very much excited by your article in May 30th
Nature
, and think that brings Biology over into the group of “exact” sciences.… If your point of view is correct each organism will be characterized by a long number written in quadrucal (?) system with figures 1, 2, 3, 4 standing for different bases.… This would open a very exciting possibility of theoretical research based on combinatorix and the theory of numbers!… I have a feeling this can be done. What do you think?

 

For the next decade, the struggle to understand the genetic code consumed a motley assortment of the world’s great minds, many of them, like Gamow, lacking any useful knowledge of biochemistry. For Watson and Crick, the initial problem had depended on a morass of specialized particulars: hydrogen bonds, salt linkages, phosphate-sugar chains with deoxyribofuranose residues. They had to learn how inorganic ions could be organized in three dimensions; they had to calculate exact angles of chemical bonds. They made models out of cardboard and tin plates. But now the problem was being transformed into an abstract game of symbol manipulation. Closely linked to DNA, its single-stranded cousin, RNA, appeared to play the role of messenger or translator. Gamow said explicitly that the underlying chemistry hardly mattered. He and others who followed him understood this as a puzzle in mathematics—a mapping between messages in different alphabets. If this was a coding problem, the tools they needed came from combinatorics and information theory. Along with physicists, they consulted cryptanalysts.

Gamow himself began impulsively by designing a combinatorial code. As he saw it, the problem was to get from the four bases in DNA to the twenty known amino acids in proteins—a code, therefore, with
four letters and twenty words.

Pure combinatorics made him think of nucleotide triplets: three-letter words. He had a detailed solution—soon known as his “diamond code”—published in
Nature
within a few months. A few months after that, Crick showed this to be utterly wrong: experimental data on protein sequences ruled out the diamond code. But Gamow was not giving up. The triplet idea was seductive. An unexpected cast of scientists joined the hunt: Max Delbrück, an ex-physicist now at Caltech in biology; his friend Richard Feynman, the quantum theorist; Edward Teller, the famous bomb maker; another Los Alamos alumnus, the mathematician Nicholas Metropolis; and Sydney Brenner, who joined Crick at the Cavendish.

They all had different coding ideas. Mathematically the problem seemed daunting even to Gamow. “As in the breaking of enemy messages during the war,” he wrote in 1954, “the success depends on the available length of the coded text. As every intelligence officer will tell you, the work is very hard, and the success depends mostly on luck.… I am afraid that the problem cannot be solved without the help of electronic computer.”

Gamow and Watson decided to make it a club: the RNA Tie Club, with exactly twenty members. Each member received a woolen tie in black and green, made to Gamow’s design by a haberdasher in Los Angeles. The game playing aside, Gamow wanted to create a communication channel to bypass journal publication. News in science had never moved so fast. “Many of the essential concepts were first proposed in informal discussions on both sides of the Atlantic and were then quickly broadcast to the cognoscenti,” said another member, Gunther Stent, “by private international bush telegraph.”

There were false starts, wild guesses, and dead ends, and the established biochemistry community did not always go along willingly.

“People didn’t necessarily
believe
in the code,” Crick said later. “The
majority of biochemists simply weren’t thinking along those lines. It was a completely novel idea, and moreover they were inclined to think it was oversimplified.”

They thought the way to understand proteins would be to study enzyme systems and the coupling of peptide units. Which was reasonable enough.

They thought protein synthesis couldn’t be a simple matter of coding from one thing to another; that sounded too much like something a
physicist
had invented. It didn’t sound like biochemistry to
them.…
So there was a certain resistance to simple ideas like three nucleotides’ coding an amino acid; people thought it was rather like cheating.

 
 

Gamow, at the other extreme, was bypassing the biochemical details to put forward an idea of shocking simplicity: that any living organism is determined by “a long number written in a four-digital system.”

He called this “the number of the beast” (from Revelation). If two beasts have the same number, they are identical twins.

By now the word
code
was so deeply embedded in the conversation that people seldom paused to notice how extraordinary it was to find such a thing—abstract symbols representing arbitrarily different abstract symbols—at work in chemistry, at the level of molecules. The genetic code performed a function with uncanny similarities to the metamathematical code invented by Gödel for his philosophical purposes. Gödel’s code substitutes plain numbers for mathematical expressions and operations; the genetic code uses triplets of nucleotides to represent amino acids. Douglas Hofstadter was the first to make this connection explicitly, in the 1980s: “between the complex machinery in a living cell that enables a DNA molecule to replicate itself and the clever machinery in a mathematical system that enables a formula to say things about itself.”

In both cases he saw a twisty feedback loop. “Nobody had ever in the least suspected that one set of chemicals could
code
for another set,” Hofstadter wrote.

Indeed, the very idea is somewhat baffling: If there is a code, then who invented it? What kinds of messages are written in it? Who writes them? Who reads them?

 
 

The Tie Club recognized that the problem was not just information storage but information transfer. DNA serves two different functions. First, it preserves information. It does this by copying itself, from generation to generation, spanning eons—a Library of Alexandria that keeps its data safe by copying itself billions of times. Notwithstanding the beautiful double helix, this information store is essentially one-dimensional: a string of elements arrayed in a line. In human DNA, the nucleotide units number more than a billion, and this detailed gigabit message must be conserved perfectly, or almost perfectly. Second, however, DNA also sends that information outward for use in the making of the organism. The data stored in a one-dimensional strand has to flower forth in three dimensions. This information transfer occurs via messages passing from the nucleic acids to proteins. So DNA not only replicates itself; separately, it dictates the manufacture of something entirely different. These proteins, with their own enormous complexity, serve as the material of a body, the mortar and bricks, and also as the control system, the plumbing and wiring and the chemical signals that control growth.

The replication of DNA is a copying of information. The manufacture of proteins is a transfer of information: the sending of a message. Biologists could see this clearly now, because the
message
was now well defined and abstracted from any particular substrate. If messages could be borne upon sound waves or electrical pulses, why not by chemical processes?

Gamow framed the issue simply: “The nucleus of a living cell is a storehouse of information.”

Furthermore, he said, it is a transmitter of information. The continuity of all life stems from this “information system”; the proper study of genetics is “the language of the cells.”

When Gamow’s diamond code proved wrong, he tried a “triangle code,” and more variations followed—also wrong. Triplet codons remained central, and a solution seemed tantalizingly close but out of
reach. A problem was how nature punctuated the seemingly unbroken DNA and RNA strands. No one could see a biological equivalent for the pauses that separate letters in Morse code, or the spaces that separate words. Perhaps every fourth base was a comma. Or maybe (Crick suggested) commas would be unnecessary if some triplets made “sense” and others made “nonsense.”

Then again, maybe a sort of tape reader just needed to start at a certain point and count off the nucleotides three by three. Among the mathematicians drawn to this problem were a group at the new Jet Propulsion Laboratory in Pasadena, California, meant to be working on aerospace research. To them it looked like a classic problem in Shannon coding theory: “the sequence of nucleotides as an infinite message, written without punctuation, from which any finite portion must be decodable into a sequence of amino acids by suitable insertion of commas.”

They constructed a
dictionary
of codes. They considered the problem of
misprints
.

Biochemistry did matter. All the world’s cryptanalysts, lacking petri dishes and laboratory kitchens, would not have been able to guess from among the universe of possible answers. When the genetic code was solved, in the early 1960s, it turned out to be full of redundancy. Much of the mapping from nucleotides to amino acids seemed arbitrary—not as neatly patterned as any of Gamow’s proposals. Some amino acids correspond to just one codon, others to two, four, or six. Particles called ribosomes ratchet along the RNA strand and translate it, three bases at a time. Some codons are redundant; some actually serve as start signals and stop signals. The redundancy serves exactly the purpose that an information theorist would expect. It provides tolerance for errors. Noise affects biological messages like any other. Errors in DNA—misprints—are mutations.

Even before the exact answer was reached, Crick crystallized its fundamental principles in a statement that he called (and is called to this day) the Central Dogma. It is a hypothesis about the direction of evolution and the origin of life; it is provable in terms of Shannon entropy in the possible chemical alphabets:

Once “information” has passed into protein it
cannot get out again
. In more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the
precise
determination of sequence.

 
 

The genetic message is independent and impenetrable: no information from events outside can change it.

Information had never been writ so small. Here is scripture at angstrom scale, published where no one can see, the Book of Life in the eye of a needle.

Omne vivum ex ovo
. “The complete description of the organism is already written in the egg,”

said Sydney Brenner to Horace Freeland Judson, molecular biology’s great chronicler, at Cambridge in the winter of 1971. “Inside every animal there is an internal description of that animal.… What is going to be difficult is the immense amount of detail that will have to be subsumed. The most economical language of description is the molecular, genetic description that is already there. We do not yet know, in that language, what the
names
are. What does the organism name
to itself
? We cannot say that an organism has, for example, a name for a finger. There’s no guarantee that in making a hand, the explanation can be couched in the terms we use for making a glove.”

Brenner was in a thoughtful mood, drinking sherry before dinner at King’s College. When he began working with Crick, less than two decades before, molecular biology did not even have a name. Two decades later, in the 1990s, scientists worldwide would undertake the mapping of the entire human genome: perhaps 20,000 genes, 3 billion base pairs. What was the most fundamental change? It was a shift of the frame, from energy and matter to information.

“All of biochemistry up to the fifties was concerned with where you get the energy and the materials for cell function,” Brenner said.
“Biochemists only thought about the flux of energy and the flow of matter. Molecular biologists started to talk about the flux of information. Looking back, one can see that the double helix brought the realization that information in biological systems could be studied in much the same way as energy and matter.…

“Look,” he told Judson, “let me give you an example. If you went to a biologist twenty years ago and asked him, How do you make a protein, he would have said, Well, that’s a horrible problem, I don’t know … but the important question is where do you get the energy to make the peptide bond. Whereas the molecular biologist would have said, That’s not the problem, the important problem is where do you get the instructions to assemble the sequence of amino acids, and to hell with the energy; the energy will look after itself.”

By this time, the technical jargon of biologists included the words
alphabet, library, editing, proofreading, transcription, translation, nonsense, synonym
, and
redundancy
. Genetics and DNA had drawn the attention not just of cryptographers but of classical linguists. Certain proteins, capable of flipping from one relatively stable state to another, were found to act as relays, accepting ciphered commands and passing them to their neighbors—switching stations in three-dimensional communications networks. Brenner, looking forward, thought the focus would turn to computer science as well. He envisioned a science—though it did not yet have a name—of chaos and complexity. “I think in the next twenty-five years we are going to have to teach biologists another language still,” he said. “I don’t know what it’s called yet; nobody knows. But what one is aiming at, I think, is the fundamental problem of the theory of elaborate systems.” He recalled John von Neumann, at the dawn of information theory and cybernetics, proposing to understand biological processes and mental processes in terms of how a computing machine might operate. “In other words,” said Brenner, “where a science like physics works in terms of laws, or a science like molecular biology, to now, is stated in terms of mechanisms, maybe now what one has to begin to think of is algorithms. Recipes. Procedures.”

Other books

6 A Thyme to Die by Joyce Lavene
8 Gone is the Witch by Dana E. Donovan
Ever Tempted by Odessa Gillespie Black
Love is Triumphant by Barbara Cartland
Hannah's List by Debbie Macomber
Search: A Novel of Forbidden History by Judith Reeves-stevens, Garfield Reeves-stevens
Surgeon at Arms by Gordon, Richard
Murder Most Unfortunate by David P Wagner
Fool for Love (High Rise) by Bliss, Harper
The Nines (The Nines #1) by Dakota Madison, Sierra Avalon