Authors: Francis Crick
I think there are two fair criticisms of natural selection. The first is that we cannot as yet calculate, from first principles, the
rate
of natural selection, except in a very approximate way, though this may become a little easier when we understand in more detail how organisms develop. It is, after all, rather odd that we worry so much how organisms evolved (a process difficult to study, since it happened in the past and is inherently unpredictable), when we still don’t know exactly how they work today. Embryology is much easier to study than evolution. The more logical strategy would be to find out first, in considerable detail, how organisms develop and how they work, and only then to worry how they evolved. Yet evolution is so fascinating a subject that we cannot resist the temptation to try to explain it now, even though our knowledge of embryology is still very incomplete.
The second criticism is that we may not yet know all the gadgetry that has been evolved to make natural selection work more efficiently. There may still be surprises for us in the tricks that are used to make for smoother and more rapid evolution. Sex is probably an example of such a mechanism, and there may, for all we know, be others as yet undiscovered. Selfish DNA—the large amounts of DNA in our chromosomes with no obvious function—may turn out to be part of another (see page 147). It is entirely possible that this selfish DNA may play an essential role in the rapid evolution of some of the complex genetic control mechanisms essential for higher organisms.
But leaving these reservations aside, the process is powerful, versatile, and very important. It is astonishing that in our modern culture so few people really understand it.
You could well accept all those arguments about evolution, natural selection, and genes, together with the idea that genes are units of instruction in an elaborate program that both forms the organism from the fertilized egg and helps control much of its later behavior. Yet you might still be puzzled. How, you might ask, can the genes be so clever? What could genes possibly do that would allow the construction of all the very elaborate and beautifully controlled parts of living things?
To answer this we must first grasp what level of size we are talking about. How big is a gene? At the time I started in biology—the late 1940s—there was already some rather indirect evidence suggesting that a single gene was perhaps no bigger than a very large molecule—that is, a macromolecule. Curiously enough, a simple, suggestive argument based on common knowledge also points in this direction.
Genetics tells us that, roughly speaking, we get half of all our genes from our mother, in the egg, and the other half from our father, in the sperm. Now, the head of a human sperm, which contains these genes, is quite small. A single sperm is far too tiny to be seen clearly by the naked eye, though it can be observed fairly easily using a high-powered microscope. Yet in this small space must be housed an almost complete set of instructions for building an entire human being (the egg providing a duplicate set). Working through the figures, the conclusion is inescapable that a gene must be, by everyday standards, very, very small, about the size of a very large chemical molecule. This alone does not tell us what a gene does, but it does hint that it might be sensible to look first at the chemistry of macromolecules.
It was also known at that time that each chemical reaction in the cell was catalyzed by a special type of large molecule. Such molecules were called enzymes. Enzymes are the machine tools of the living cell. They were first discovered in 1897 by Eduard Buchner, who received a Nobel Prize ten years later for his discovery. In the course of his experiments, he crushed yeast cells in a hydraulic press and obtained a rich mixture of yeast juices. He wondered whether such fragments of a living cell could carry out any of its chemical reactions, since at that time most people thought that the cell must be intact for such reactions to occur. Because he wanted to preserve the juice, he adopted a stratagem used in the kitchen: he added a lot of sugar. To his astonishment, the juice fermented the sugar solution! Thus were enzymes discovered. (The word enzyme means “in yeast.”) It was soon found that enzymes could be obtained from many other types of cell, including our own, and that each cell contained very many distinct kinds of enzymes. Even a simple bacterial cell may contain more than a thousand different
types
of enzymes. There may be hundreds or thousands of molecules of any one type.
In favorable circumstances an enzyme could be purified away from all the others and its action studied by itself in solution. Such studies showed that each enzyme was very specific, and catalyzed only one particular chemical reaction or, at most, a few related ones. Without that particular enzyme the chemical reaction, under the mild conditions of temperature and acidity usually found in living cells, would proceed only very, very slowly. Add the enzyme and the reaction goes at a good pace. If you make a well-dispersed solution of starch in water, very little will happen. Spit into it and the enzyme amylase in your saliva will start to digest the starch and release sugars.
The next major discovery was that each of the enzymes studied was a macromolecule and that they all belonged to the same family of macromolecules called proteins. The key discovery was made in 1926 by a one-armed American chemist called James Sumner. It is not all that easy to do chemistry when you have only one arm (he had lost the other in a shooting accident when he was a boy) but Sumner, who was a very determined man, decided he would nevertheless demonstrate that enzymes were proteins. Though he showed that one particular enzyme, urease, was a protein and obtained crystals of it, his results were not immediately accepted. In fact, a group of German workers hotly contested the idea, which somewhat embittered Sumner, but it turned out that he was correct. In 1946 he was awarded part of the Nobel Prize in Chemistry for his discovery. Though very recently a few significant exceptions to this rule have turned up, it is still true that almost all enzymes are proteins.
Proteins are thus a family of subtle and versatile molecules. As soon as I learned about them I realized that one of the key problems was to explain how they were synthesized.
There was a third important generalization, though in the 1940s this was sufficiently new that not everybody was inclined to accept it. This idea was due to George Beadle and Ed Tatum. (They too were to receive a Nobel Prize, in 1958, for their discovery.) Working with the little bread-mold
Neurospora
, they had found that each mutant of it they studied appeared to lack just a single enzyme. They coined the famous slogan “One gene—one enzyme.”
Thus the general plan of living things seemed almost obvious. Each gene determines a particular protein. Some of these proteins are used to form structures or to carry signals, while many of them are the catalysts that decide what chemical reactions should and should not take place in each cell. Almost every cell in our bodies has a complete set of genes within it, and this chemical program directs how each cell metabolizes, grows, and interacts with its neighbors. Armed with all this (to me) new knowledge, it did not take much to recognize the key questions. What are genes made of? How are they copied exactly? And how do they control, or at least influence, the synthesis of proteins?
It had been known for some time that most of a cell’s genes are located on its chromosomes and that chromosomes were probably made of nucleoprotein—that is, of protein and DNA, with perhaps some RNA as well. In the early 1940s it was thought, quite erroneously, that DNA molecules were small and, even more erroneously, simple. Phoebus Levene, the leading expert on nucleic acid in the 1930s, had proposed that they had a regular repeating structure [the so-called tetranucleotide hypothesis]. This hardly suggested that they could easily carry genetic information. Surely, it was thought, if genes had to have such remarkable properties, they must be made of proteins, since proteins as a class were known to be capable of such remarkable functions. Perhaps the DNA there had some associated function, such as acting as a scaffold for the more sophisticated proteins.
It was also known that each protein was a polymer. That is, it consisted of a long chain, known as a polypeptide chain, constructed by stringing together, end to end, small organic molecules, called monomers since they are the elements of a polymer. In a homopolymer, such as nylon, the small monomers are usually all the same. Proteins are not as simple as that. Each protein is a heteropolymer, its chains being strung together from a selection of somewhat different small molecules, in this instance amino acids. The net result is that, chemically speaking, each polypeptide chain has a completely regular backbone, with little side-chains attached at regular intervals. It was believed that there were about twenty different possible side-chains (the exact number was not known at that time). The amino acids (the monomers) are just like the letters in a font of type. The base of each kind of letter from the font is always the same, so that it can fit into the grooves that hold the assembled type, but the top of each letter is different, so that a particular letter will be printed from it. Each protein has a characteristic number of amino acids, usually several hundred of them, so any particular protein could be thought of crudely as a paragraph written in a special language having about twenty (chemical) letters. It was not then known for certain, as it is now, that for each protein the letters have to be in a particular order (as indeed they have to be in a particular paragraph). This was first shown a little later by the biochemist Fred Sanger, but it was easy enough to guess that this was likely to be true.
Of course each paragraph in our language is really one long line of letters. For convenience this is split up into a series of lines, written one under the other, but this is only a secondary matter, since the meaning is exactly the same whether the lines are long or short, few or many, provided we take care about splitting the words at the end of each line. Proteins were known to be very different. Although the polypeptide backbone is chemically regular, it contains flexible links, so that in principle many different three-dimensional shapes are possible. Nevertheless, each protein appeared to have its own shape, and in many cases this shape was known to be fairly compact (the word used was “globular”) rather than very extended (or “fibrous”). A number of proteins had been crystallized, and these crystals gave detailed X-ray diffraction patterns, suggesting that the three-dimensional structure of each molecule of a particular kind of protein was exactly (or almost exactly) the same. Moreover many proteins, if heated briefly to the boiling point of water, or even to some temperature below this, became denatured, as if they had unfolded so that their three-dimensional structure had been partly destroyed. When this happened the denatured protein usually lost its catalytic or other function, strongly suggesting that the function of such a protein
depended on its exact three-dimensional structure.
And now we can approach the baffling problem that appeared to face us. If genes are made of protein, it seemed likely that each gene had to have a special three-dimensional, somewhat compact structure. Now, a vital property of a gene was that it could be copied exactly for generation after generation, with only occasional mistakes. What we were trying to guess was the general nature of this copying mechanism. Surely the way to copy something was to make a complementary structure—a mold—and then to make a further complementary structure of the mold, to produce in this way an exact copy of the original. This, after all, is how, broadly speaking, sculpture is copied. But then the dilemma arose: It is easy to copy the
outside
of a three-dimensional structure in this way, but how on earth could one copy the
inside?
The whole process seemed so utterly mysterious that one hardly knew how to begin thinking about it.
Of course, now that we know the answer, it all seems so completely obvious that no one nowadays remembers just how puzzling the problem seemed then. If by chance you do
not
know the answer, I ask you to pause a moment and reflect on what the answer might be. There is no need, at this stage, to bother about the details of the chemistry. It is the principle of the idea that matters. The problem was not made easier by the fact that many of the properties of proteins and genes just outlined were not known for certain. All of them were plausible and most of them seemed very probable but, as in most problems near the frontiers of research, there were always nagging doubts that one or more of these assumptions might be dangerously misleading. In research the front line is almost always in a fog.
So what was the answer? Curiously enough, I had arrived at the correct solution before Jim Watson and I discovered the double-helical structure of DNA. The basic idea (which was not entirely new) was this: All a gene had to do was to get the
sequence
of the amino acids correct in that protein. Once the correct polypeptide chain had been synthesized, with all its side chains in the right order, then, following the laws of chemistry, the protein
would fold itself up correctly into a unique three-dimensional structure.
(What the exact three-dimensional structure of each protein was remained to be determined.) By this bold assumption the problem was changed from a three-dimensional one to a one-dimensional one, and the original dilemma largely disappeared.
Of course, this had not
solved
the problem. It had merely transformed it from an intractable one to a manageable one. For the problem still remained: how to make an exact copy of a one-dimensional sequence. To approach that we must return to what was known about DNA.
By the late 1940s our knowledge of DNA had improved in several important respects. It had been discovered that DNA molecules were not, after all, very short. Exactly how long they were was not clear. We know now that they appeared to be short because, being long molecules (in the sense that a piece of string is long), they could easily be broken in the process of getting them out of the cell and manipulating them in the test tube. Just stirring a DNA solution is enough to break the longer molecules. Their chemistry was now known more correctly, and moreover the tetranucleotide hypothesis was dead, killed by some very beautiful work by a chemist at Columbia, the Austrian refugee Erwin Chargaff. DNA was known to be a polymer, but with a very different backbone and with only four letters in its alphabet, rather than twenty. Chargaff showed that DNA from different sources had rather different amounts of those four bases (as they were called). Perhaps DNA was not such a dumb molecule after all. It might conceivably be long enough and varied enough to carry some genetic information.