Authors: Francis Crick
Meanwhile Joe, in his typical way, had founded that unusual organization, the RNA Tie Club. This was a very select club—Gamow decided who was to be a member. There were to be only twenty members, one for each amino acid, and not only did each member receive a tie, made to Gamow’s design by a haberdasher in Los Angeles (Jim Watson and Leslie Orgel arranged this), but also a tie pin with the short form of his own amino acid on it. I think I was Tyr but I’m not sure I ever got the tie pin. The club never met, but it had notepaper that listed its officers. Geo Gamow was described as Synthesizer, Jim Watson as Optimist, and I as Pessimist. Martynas Yeas was denoted Archivist and Alex Rich as Lord Privy Seal. As it turned out the club served as a mechanism for circulating speculative manuscripts to the few people interested. After I returned to England in the fall of 1956 I wrote a paper for it analyzing Gamow’s ideas, generalizing them, and suggesting what turned out to be an important idea, the adaptor hypothesis.
The paper was called “On Degenerate Templates and the Adaptor Hypothesis.” The main idea was that it was very difficult to consider how DNA or RNA, in any conceivable form, could provide a direct template for the side-chains of the twenty standard amino acids. What any structure
was
likely to have was a specific pattern of atomic groups that could form hydrogen bonds. I therefore proposed a theory in which there were twenty adaptors (one for each amino acid), together with twenty special enzymes. Each enzyme would join one particular amino acid to its own special adaptor. This combination would then diffuse to the RNA template. An adaptor molecule could fit in only those places on the nucleic acid template where it could form the necessary hydrogen bonds to hold it in place. Sitting there, it would have carried its amino acid to just the right place it was needed.
There were several implications of this idea. The one I want to stress here was that it meant that the genetic code could have almost
any
structure, since its details would depend on which amino acid went with which adaptor. This had probably been decided very early in evolution and possibly by chance. Because of this pessimistic conclusion the paper led off with a quotation from an obscure Persian writer of the eleventh century: “Is there anyone so utterly lost as he that seeks a way where there is no way?” and ended with the remark, “In the comparative isolation of Cambridge, I must confess there are times when I have no stomach for the coding problem.”
The paper was circulated to members of the RNA Tie Club but was never published in a proper journal. It is my most influential unpublished paper. Eventually I did publish a short remark briefly outlining the idea and tentatively suggesting that the adaptor might be a small piece of nucleic acid. It soon turned out that a biochemist at the Harvard Medical School, Mahlon Hoagland, had quite independently obtained some experimental evidence that supported my proposal. As every molecular biologist now knows, the job is done by a family of molecules now called transfer RNA. Ironically, I did not immediately recognize that these transfer RNA molecules were the predicted adaptor because they were considerably bigger than I had expected, but I soon saw that there were no grounds for my objection. A little later Mahlon came to Cambridge for a year and we did experiments together on transfer RNA. We worked in a small upstairs room in the Molteno Institute that the director graciously allowed us to use since it was temporarily vacant.
Much theoretical effort during this period was put into attempts to solve the coding problem, especially by Gamow, Yeas, and Rich. Gamow and Yeas suggested a “combination code” in which the
order
of the bases in a triplet did not matter, only its combination of bases. While this was structurally implausible it had some appeal because it so happens there are just twenty combinations of four things taken three at a time. Again there was no hint as to how to allocate each amino acid to its own combination.
For a time it was still thought that the code would have to be an overlapping one, and so the search for restrictions on the amino acid sequence continued. As new sequences became available they were added to those we had already collected, but there was little hint of any forbidden sequences, although the data were so sparse that at first we could not be sure that some sequences were missing. The hunt was mainly restricted to adjacent amino acids. There are 400 (20 × 20) possible amino acid doublets.
Any
overlapping triplet could code for only 256 (64 possible triplets × 4) of these, so there had to be restrictions if the code were of this type. Sydney Brenner realized that one could sharpen this argument. Any one triplet would have only four other triplets as its neighbors on one side. For example, if the triplet in question was AAT, then the only triplets that could precede it were
T
AA,
C
AA,
A
AA, and
G
AA, while only AT
T
, AT
C
, AT
A
, and AT
G
could follow it, assuming as always that the code was overlapping. Thus if in the known sequences one particular amino acid had been shown to have at least nine neighbors following it, then it would have to have at least three triplets allocated to it, since two triplets could have only eight neighbors following it. Sydney was able to show that the number of triplets needed easily exceeded sixty-four and thus tnat
all
overlapping triplet codes were impossible. This proof assumed that the code was “universal"—that is, was the same in all the organisms from which the experimental data had come—but this was sufficiently plausible to make us almost certain that the idea of an overlapping code was wrong.
This still left the geometrical dilemma. In the process of protein synthesis, how could one amino acid get near enough to the next one to enable them to be joined together, since their triplets would have to be some distance apart as they were not overlapping? Sydney suggested that the postulated adaptors might each have a small flexible tail, to the end of which the appropriate amino acid was joined. Sydney and I did not at the time take this idea very seriously, referring to it as a “don’t worry” theory, meaning that we could see at least one way that nature might have solved the problem, so why worry at this stage what the correct answer actually was, especially as we had more important problems to tackle. In this case it has turned out that Sydney was correct. Each transfer RNA does indeed have a small flexible tail to which the amino acid is joined.
In parenthesis let me say that the English school of molecular biologists, when they needed a word for a new concept, usually use a common English word such as “nonsense” or “overlapping,” whereas the Paris school like to coin one with classical roots, such as “capsomere” or “allosterie.” Ex-physicists, such as Seymour Benzer, enjoyed inventing new words ending in “-on,” such as “muton,” “recon,” and “cistron.” These new words often obtained rapid currency. I was once persuaded by the molecular biologist François Jacob to give a talk to the physiology club in Paris. It was then the rule that all such talks had to be given in French. As I hardly speak French I did not warm to his suggestion at all, but François pointed out to Odile (who is bilingual in French and English) that if I gave the talk she also could have a trip to Paris, so my opposition was soon worn down. I decided to talk on the problem of the genetic code, thinking, quite incorrectly, that I could do most of it by simply writing on the blackboard. It soon became clear that I would have to speak some French in order to get the ideas across, so I started by dictating the whole talk to a secretary (normally I speak from notes). I then deleted all the jokes, since even when giving a talk to a secretary I found that my ad lib jokes intruded, and I felt I could hardly read them out in cold blood. Odile then translated the talk into French, and a typed version of her manuscript was produced, with various stress marks added to make it easier for me to read. There was a problem, however, about the translation of “overlapping.” What could be the French for that? Odile eventually remembered a suitable word, and we set off for Paris. I was sufficiently mistrustful of this strange word that on arrival I asked François what word
they
used for “overlapping.” “Oh,” he said, “we simply say ‘oh-ver-lap-pang.’”
I would like to report that the talk was a success. I started off fairly well, reading carefully, but as I warmed up my pronunciation got gradually wilder and wilder. The discussion, mainly in French, taxed me greatly. After the talk I asked François how it went. “It was not
too
bad,” he said tactfully, “but it was not
you.
“ With no spontaneity and no jokes I saw just what he meant. I have never since attempted to give a talk in a foreign language, even though my French accent has improved a little over the years.
It was now clear that the code was not overlapping, but this immediately raised a new problem. If the code was read as a sequence of now-overlapping triplets, how did we know where the triplets began? Put another way, if we were to imagine that the correct triplets were marked by commas (for example, ATC,CGA,TTC,…), how did the cell know exactly where to put the commas? The obvious idea, that one started at the beginning (whatever that was) and went along three at a time, seemed too simple, and I thought (quite wrongly) that there must be another solution. It occurred to me to try to construct a code with the following properties. If read in the right phase, all the triplets would be “sense” (that is, stand for one amino acid or another), whereas all the out-of-phase triplets (those that bridged the imaginary commas), would be “nonsense"—that is, there would be no adaptor for them and thus they would not stand for any amino acid. I mentioned this idea to Leslie Orgel, who immediately pointed out that for such a code the
maximum
number of sense triplets was twenty. A triplet such as AAA must be nonsense since otherwise the sequence AAA, AAA could be read out of phase. (We tacitly assumed by now that any amino acid could follow any other amino acid.) That eliminated four of the sixty-four triplets. If the XYZ triplet was sense, then the cyclic permutations YZX and ZXY would have to be nonsense, so the maximum number of sense triplets was 60/3 = 20. The problem was: Did a set of twenty triplets exist that had this property? I was confined to bed with a nasty cold but found I could easily get up to seventeen. Leslie mentioned the problem to John Griffith, who found a set of twenty with the right properties. We soon found several other solutions (plus numerous permutations) so there was no doubt that such a code could exist. We even invented a plausible argument why it could be useful.
The problem of finding a solution having twenty sense triplets is actually not an especially difficult one. A little later I was booked on a night flight from the States to England. Waiting to board I found myself chatting to Fred Hoyle, the cosmologist. He asked what I was doing and I explained to him the idea of the comma-free code. The next morning, as the plane approached the English coast, he came back to where I was sitting with a solution he had worked out overnight.
Naturally Orgel, Griffith, and I were excited by the idea of a comma-free code. It seemed so pretty, almost elegant. You fed in the magic numbers 4 (the 4 bases) and 3 (the triplet) and out came the magic number 20, the number of the amino acids. Without more ado we wrote it up for the RNA Tie Club. Nevertheless I was hesitant. I realized that we had no
other
evidence for the code, other than the striking emergence of the number twenty. But then if some other number had come up we would have discarded the idea and looked around for some other code that led to twenty amino acids, so the number twenty by itself was not confirmatory evidence.
In spite of my worries, the new code attracted some attention. After four people had asked if they could quote our paper (an RNA Tie Club note was not equivalent to publication), we decided to write it up for the
Proceedings of the U.S. National Academy of Science
, where it duly appeared in 1957. An account of it even appeared in a book for the general reader called
The Coil of Life
written by Ruth Moore, though this was not published till 1961, by which time we had ceased to believe in the idea.
Since in the comma-less code each amino acid had just one triplet it would have been possible, knowing which amino acid went with each triplet, to deduce the base composition of the DNA, assuming it all coded for protein, from the average amino acid composition of all its proteins. Because the latter was pretty similar in all organisms (though we knew now there were small variations), this would imply that the DNA molecules in all species had much the same composition. As more measurements were made, especially on different types of bacteria, it became clear that this was very far from the case. Of course in all cases the amount of A was the same as the amount of T (A=T) since the base pairing demanded this, and for the same reason G=C, but the structure of DNA itself put no restrictions on the ratio of A+T to G+C, and this ratio was found to vary a lot from one organism to another. This made it likely that the comma-free code must be wrong.
Its final downfall came from two directions. Our work on phase-shift mutants, described in
chapter 12
, made it unlikely, but a more decisive blow was dealt by Marshall Nirenberg when he showed that poly U (a simple form of RNA) coded for polyphenylalanine, whereas in a comma-free code UUU should have been a nonsense triplet. Finally the correct genetic code, confirmed by so many methods, has proved decisively that the whole idea is quite erroneous. However, it is just conceivable that it may haveplayed a role near the origin of life, when the code first began to evolve, but this is pure speculation.
The idea of comma-free codes attracted the attention of combinatorialists, in particular Sol Golomb. We had failed to solve the problem of enumerating all possible triplet overlapping codes (with four letters) although we had found more than one solution. This enumeration was worked out by Golomb and Welch, using a very neat argument (which we ought to have seen for ourselves) as a key part of the proof. The problem was also solved by the Dutch mathematician H. Freudenthal at about the same time.