Authors: Adam Rutherford
With RNA being a decent contender for the first world of information and replication, this is a simple piece of RNA that does both those things. The letters are standard RNA (with
N
being a wild card, representing any of the four bases
A, C, G,
and
U
), and is made of two parts (shown separated by a dash). In a test tube, R3C contorts into a hairpin shape. Its function is to make a mirror version of itself by linking the two parts together. This propinquity for its reflection in turn makes a new copy of the original, and so on. This goes on ad infinitum, as long as the system is fed with the ingredients that will enable the continued chemical reactions which drive replication. Hundreds of millions of copying molecules can be made in a few hours.
It is a kind of protogene made of RNA, not DNA. Unlike our own genes, which have functions and instructions for proteins that build tissue and bone, or issue commands so that other genes will do so, the instruction that R3C carries is simply “copy me.” DNA relies on other mechanics to copy itself, but R3C doesn't need any help: it's a photocopier that only makes other photocopiers. The fact that its only instruction is to copy itself doesn't make it very useful for a modern organism. But genetics had to start somewhere, and, hypothetically, this might be similar to what the first genes looked likeâa copying machine. RNA that has function is called a ribozyme.
7
It's in these simple, double-action molecules that many scientists see a fundamental precept of life: reproduction of information. Joyce believes this is “where chemistry starts turning into biology. It's the first case, other than in biology, of molecular information having been immortalized.”
In biological terms, the ribozyme is the smallest unit of information; in computer terms, a single bit. But in the test tube, as they promote their own replication, they enact a form of Darwinian selection. Joyce's experiment at first was marked with fidelity: the ribozymes reproduced themselves impeccably. But evolution requires flawed copying, and, as Joyce tells me, “perfection is boring.” So Joyce introduced imperfection into the mixâsequence infidelity, the equivalent of spelling mistakesâin order that each copy was potentially fractionally different. Hence, Joyce introduced the foundation of Darwinian selection: variation. Out of a pool of these short RNA molecules, ones that reproduce themselves with greater success emerge to become the dominant form. These molecules have, through no guiding hand other than Joyce's providing the right conditions and feedstock, undergone a nonliving form of natural selection.
It's a clever approach to the origin of the genetic code. These experimental ribozymes are not natural but they are only partially designed. David Bartel and Jack Szostak created a technique in the 1990s that gave a great fillip to the idea of an ancient RNA world before DNA and proteins. Their technique allows functional ribozymes to at least partly create themselves. It's akin to the fabled monkeys-with-typewriters idea; that with a big enough number of primates bashing away on keyboards, one will eventually type a Shakespearean sonnet. Bartel and Szostak put together a pool of literally trillions of strings of RNA, which were identical for a short stretch at one end, then random for the next two hundred letters. In the trillions of possible combinations they were looking to find one that, purely by chance, had the ability to add another similar RNA molecule onto itself. They then fished in that pool with a tagged piece of RNA, a bait to hook any one of those trillions of random molecules that could clasp another RNA molecule. In the simian-typing metaphor, it would be like searching trillions of simian text files with the phrase “darling buds of May.” If that sounds easy, remember that these are entirely random sequences, and they are asking for meaning, or in this case function, to come from nothing. Bartel and Szostak found exactly that ability from their pool of RNA molecules at a rate of around one in twenty trillion.
The monkeys-with-typewriters example is purely random, and makes the point that, when working with very large numbers, patterns (or prose) will emerge.
8
Evolution is very much not random. The variation in genetic code may have arisen entirely by chance, but selection (whether natural or by the hand of a creator) is quite the opposite. Bartel and Szostak fully emulate biological evolution with their method by allowing variation to occur, but selecting the variants that work. Then, just as in nature, they repeat, but this time using only the ribozymes that already had demonstrated an RNA linking ability (as if, hoping not to stretch my monkey metaphor too far, giving them the first line of a sonnet as a starting point to their frenetic random typing). After ten rounds of fishing for those that could bind RNA together, the ribozymes that had made it through this harshest of talent contests were several million times better at RNA joining. There's a drop of medicine to take with that, which is that naturally occurring enzymes work around ten thousand times faster than the best of Bartel and Szostak's ribozymes, which were unlikely to improve much more. But recall that we are dealing with introducing function into a world where there was none, and this RNA world was transient, destined to be replaced by the more efficient and effective DNA and proteins.
In essence, these experiments are attempting to create something more fundamental than life. That ribozymes were the first genes, needing neither protein nor DNA, is not something we can actually know for sure, though it is uncontroversial now. Ribozymes do currently exist in nature, but if RNA self-replicators were the first genes, they've been extinct for more than three billion years, and we are experimenting to resuscitate an unrecorded dead language that is only known by its descendants.
In Cambridge, England, the Laboratory of Molecular Biology (LMB) has been called the Nobel Prize factory, as nineteen of those most sought-after medals were won in a dull gray glass building. The LMB recently relocated around the corner to a shiny new gray glass building, where Philipp Holliger runs a lab to explore the lost RNA world. His crew used a similar test-tube evolution to pick out another candidate for an aboriginal gene: they started with a smaller lucky dip of a paltry ten million variants and, by using some technical wizardry involving oil and magnetic beads, selected a ribozyme that produces more RNA. This one doesn't replicate itself, but they managed to cajole it into producing an entirely different ribozyme, a well-known one rather more dramatically called the hammerhead.
9
So far, Holliger's ribozyme can only make new RNAs of around one hundred bases in length, which is a little short for a ribozyme and a long way from the average modern gene, but Holliger and his crew are working on extension.
Here we have the emergence of plausible models for the origin of information in chemistry and, more important, the origin of information copying. This is a hallmark of life. The amazing ability of Joyce's and Holliger's ribozymes is their spontaneous function. Joyce's copies itself, and Holliger's copies a lot else. The great goal is to shape the emergence of one that does both.
Creation by Deletion
But we can go back further. Ribozymes, as with all genetic code, have their four letters with which they harbor information: A, U (instead of T in DNA), C, and G. If we assume that Luca had this alphabet in place, and that is as far back as our historical records can go, we have no record of how this complete code came to be. This is unlike the evolution of language, where we have historical records of previous forms with different letters and meanings. The acquisition of the language of DNA and RNA is crucial to the understanding of the origin of life. To think that all four letters of code were acquired simultaneously seems more unlikely than if they were acquired sequentially, one at a time. We can test that idea by deleting them sequentially.
As in Scrabble, some letters are more valuable than others.
C,
cytosine, bound to a
G
is not as stable as a bit of code containing
A
and
T,
which will be the first to fall apart in heat or cold. On top of that, in RNA, the letters can pair up in a slightly different way from the neat and tidy rungs of the DNA ladder. This is rather sweetly called wobble pairing, and helps the RNA fold into the loops and folds that give ribozymes their self-copying ability. In the absence of
C,
both normal and wobble pairing can still occur.
Once again starting from a colossal pool of ribozymes with random variations in their sequence, Gerald Joyce and his colleague Jeff Rogers drove evolution by repeatedly fishing for ones containing no cytidine,
10
using bait with little affinity for that letter, but that could still connect two RNA molecules. Effectively, they were breeding out a particular trait from their stock, but instead of its being an undesirable characteristic in an animal or plant, the stock is a 140-letter ribozyme, and the trait is the letter
C
. Even in the absence of one-quarter of the available letters, they bred a ribozyme that could ably retain its joining function.
Once you've discarded one of the letters and showed that biological molecules can still work, what, then, is the next experiment? Joyce's approach, obvious but no less brilliant, was to evolve a ribozyme with only two letters of code. Necessarily, this involved evolving ribozymes with similar but different letters, in this case referred to as
D
and
U,
and the activity of the molecule they bred was much reduced from a full alphabet version. But nonetheless it still functions as an active biological tool. It may even be the case that this limited alphabet molecule would have been advantageous in hot Archaean waters, as the folded bonds containing
C
might have self-destructed at high temperatures.
These clever experiments don't show what actually happened in the origin of genetics, but they do show what might have happened. They show that evolution can proceed with an alphabet that is more restricted than the one life now uses. They feed into a framework that persuasively builds up an origin for coded replication; in other words, genetics. If one of the defining characteristics of life-forms is the ability to store and reproduce information, then one of the most fundamental questions of the origin of life is how such a complex system could have begun. How it compares with what might have happened four billion years ago is difficult to say. We have the advantage of ingredients and design, at least experimental design, rather than the messy, low-tech, bucket chemistry of the early earth. But that first time had millions of years, whereas scientists have made these new replicators in a decade. The task is not to replicate what happened once; that will be impossible. But in all origin of life studies it's important to remember that we know the answer: life is the answer. The question is finding a believable route to get there, and in these experiments such a thing is beginning to emerge.
The Origin of the Alphabet
We can go back further still. When Gerald Joyce asks one of his brews to copy and mutate ribozymes, he has to supply the ingredients. The Archaean earth didn't have a sterile lab with sterile glassware and manufactured chemicals bought purified from industrial companies. Joyce's ribozymes provide a plausible mechanism for the first genes, and the basis for a language that has survived for billions of years. The letters of this language are the bases of RNA, so the next question is: where did they come from? These are not trivial molecules, certainly in terms of their importance in bearing the code of life, but also because they are complex. We describe them as complex molecules for a couple of reasons, the first one being that the letters of genetic code are manufactured in a difficult-to-learn biological pathway involving many different proteins. Our cells do this unthinkingly, as they have evolved highly advanced processes of metabolism that sustain their own existence by making their own parts. We look at these processes and carefully work them out with awe, and sometimes terror, when studying for exams. The metabolic pathways that create bases in our cells are evolved over innumerable iterationsâthe most stringent, patient, and ruthless design process known. Yet naturally, if these complex chemicals predate the seed of life's tree, there was no sophisticated metabolism with which they could be built.
With that in mind, the chemical structure of the letters of genetic code is complex primarily because we say so. If that sounds like circular logic, the second reason we think of them as complex is because they're not easy for us to synthesize chemically, either. The cell's process of synthesis is intricate so we see it as complicated, and our attempts to replicate that synthesis are tricky as well. Those bases have to line up in exactly the right way to form a working ribozyme, and for decades this has proved to be largely elusive in the chemistry lab. Additionally, there is a major supply problem. To make one of Gerald Joyce's self-replicating ribozymes you need seventy-odd RNA bases, but that's only if you know exactly what you are making. If one were to emerge from a random pool of millions, you'd need billions of bases to begin with. And when it starts copying itself, the number of letters goes up exponentially. Ten rounds of replication of a 100-base ribozyme would need a pool of more than 10,000 bases, but one hundred rounds of copying would need more than 10^30. Add to this the fact that each round reduces the concentration of the letter pool, which makes the replication harder. It would be like spelling words out of alphabet soup; after using up a few letters, it becomes harder to spell more words. In order for you to keep spelling out new words, you need a constant supply of letters, possibly an entire Campbell's factory's worth.
Therefore, before we get to the self-catalyzing, self-copying ribozymes, an earlier, more basic issue to examine is where those ingredients come from in the natural, inanimate, embryonic world. The problem is this: without biochemistry to hand, how did these letters spontaneously form, when we find it so difficult in the lab? This has been a question for nearly forty years, the duration of the RNA world hypothesis's life.