Gödel, Escher, Bach: An Eternal Golden Braid (90 page)

Read Gödel, Escher, Bach: An Eternal Golden Braid Online

Authors: Douglas R. Hofstadter

Tags: #Computers, #Art, #Classical, #Symmetry, #Bach; Johann Sebastian, #Individual Artists, #Science, #Science & Technology, #Philosophy, #General, #Metamathematics, #Intelligence (AI) & Semantics, #G'odel; Kurt, #Music, #Logic, #Biography & Autobiography, #Mathematics, #Genres & Styles, #Artificial Intelligence, #Escher; M. C

BOOK: Gödel, Escher, Bach: An Eternal Golden Braid
7.44Mb size Format: txt, pdf, ePub

So G is an outstanding example of a self-ref via translation-hardly the most straightforward case. One might also think back to some of the Dialogues, for some of them, too, are self-refs via translation. For instance, take the
Sonata for Unaccompanied Achilles
.

In that Dialogue, several references are made to the Bach Sonatas for unaccompanied violin, and the Tortoise's suggestion of imagining harpsichord accompaniments is particularly interesting. After all, if one applies this idea to the Dialogue itself, one invents lines which the Tortoise is saying; but if one assumes that Achilles' part stands alone (as does the violin), then it is quite wrong to attribute any lines at all to the Tortoise. In any case, here again is a self-ref by means of a mapping which maps Dialogues onto pieces by Bach. And this mapping is

left, of course, for the reader to notice. Yet even if the reader does not notice it, the mapping is still there, and the Dialogue is still a self-ref.

A Self-Rep by Augmentation

We have been likening self-reps to canons. What, then, would be a fair analogue to a canon by augmentation? Here is a possibility: consider a program which contains a dummy loop whose only purpose is to slow up the program. A parameter might tell how often to repeat the loop. A self-rep could be made which prints out a copy of itself, but with the parameter changed, so that when that copy is run, it will run at half the speed of its parent program; and its "daughter" will in turn run at half again the speed, and so on . . . None of these programs prints itself out precisely; yet all clearly belong to a single "family".

This is reminiscent of the self-reproduction of living organisms. Clearly, an individual is never identical to either of its parents; why, then, is the act of making young called "self-reproduction'? The answer is that there is a coarse-grained isomorphism between parent and child; it is an isomorphism which preserves the information about species. Thus, what is reproduced is the class, rather than the instance. This is also the case in the recursive picture Gplot, in Chapter V: that is, the mapping between "magnetic butterflies" of various sizes and shapes is coarse-grained; no two are identical, but they all belong to a single

"species", and the mapping preserves precisely that fact. In terms of self-replicating programs, this would correspond to a family of programs, all written in "dialects" of a single computer language; each one can write itself out, but slightly modified, so that it comes out in a dialect of its original language.

A Kimian Self-Rep

Perhaps the sneakiest example of a self-rep is the following: instead of writing a legal expression in the compiler language, you type one of the compiler's own error messages.

When the compiler looks at your "program", the first thing it does is get confused, because your "program" is ungrammatical; hence the compiler prints out an error message. All you need to do is arrange that the one it prints out will be the one you typed in. This kind of selfrep, suggested to me by Scott Kim, exploits a different level of the system from the one you would normally approach. Although it may seem frivolous, it may have counterparts in complex systems where self-reps vie against each other for survival, as we shall soon discuss.

What Is the Original?

Besides the question "What constitutes a copy?", there is another fundamental philosophical question concerning self-reps. That is the obverse

side of the coin: "What is the original?" This can best be explained by referring to some examples:

(1) a program which, when interpreted by some interpreter running on some computer, prints itself out;

(2) a program which, when interpreted by some interpreter running on some computer. prints itself out along with a complete copy of the interpreter (which, after all, is also a program);

(3) a program which, when interpreted by some interpreter running on some computer, not only prints itself out along with a complete copy of the interpreter, but also directs a mechanical assembly process in which a second computer, identical to the one on which the interpreter and program are running, is put together.

It is clear that in (1), the program is the self-rep. But in (3), is it the program which is the selfrep, or the compound system of program plus interpreter, or the union of program, interpreter, and processor?

Clearly, a self-rep can involve more than just printing itself out. In fact, most of the rest of this Chapter is a discussion of self-reps in which data, program, interpreter, and processor are all extremely intertwined, and in which self-replication involves replicating all of them at once.

Typogenetics

We are now about to broach one of the most fascinating and profound topics of the twentieth century: the study of "the molecular logic of the living state", to borrow Albert Lehninger's richly evocative phrase. And logic it is, too but of 'a sort more complex and beautiful than any a human mind ever imagined. We will come at it from a slightly novel angle: via an artificial solitaire game which I call Typogenetics-short for "Typographical Genetics". In Typogenetics I have tried to capture some ideas of molecular genetics in a typographical system which, on first sight, resembles very much the formal systems exemplified by the
MIU
-system. Of course, Typogenetics involves many simplifications, and therefore is useful primarily for didactic purposes.

I should explain immediately that the field of molecular biology is a field in which phenomena on several levels interact, and that Typogenetics is only trying to illustrate phenomena from one or two levels. In particular, purely chemical aspects have been completely avoided-they belong to a level lower than is here dealt with; similarly, all aspects of classical genetics (viz., nonmolecular genetics) have also been avoided-they belong to a level higher than is here dealt with. I have intended in Typogenetics only to give an intuition for those processes centered on the celebrated
Central Dogma of
Molecular Biology,
enunciated by Francis Crick (one of the co-discoverers of the double-helix structure of DNA):

DNA => RNA => proteins.

It is my hope that with this very skeletal model I have constructed the reader will perceive some simple unifying principles of the field principles which might otherwise be obscured by the enormously intricate interplay of phenomena at many different levels. What is sacrificed is, of course, strict accuracy; what is gained is, I hope, a little insight.

Strands, Bases, Enzymes

The game of Typogenetics involves typographical manipulation on sequences of letters.

There are four letters involved:

A C G T.

Arbitrary sequences of them are called
strands
. Thus, some strands are:
GGGG

ATTACCA

CATCATCATCAT

Incidentally, "
STRAND
" spelled backwards begins with "
DNA
". This is appropriate since strands, in Typogenetics, play the role of pieces of DNA (which, in real genetics, are often called "strands"). Not only this, but "
STRAND
" fully spelled out backwards is "
DNA RTS
", which may be taken as an acronym for "
D
NA
R
apid
T
ransit
S
ervice". This, too, is appropriate, for the function of "messenger
RNA
"-which in Typogenetics is represented by strands as well-is quite well characterized by the phrase "
R
apid
T
ransit
S
ervice" for
DNA
, as we shall see later.

I will sometimes refer to the letters
A, C, G, T
as bases, and to the positions which they occupy as units. Thus, in the middle strand, there are seven units, in the fourth of which is found the base
A
.

If you have a strand, you can operate on it and change it in various ways. You can also produce additional strands, either by copying, or by cutting a strand in two. Some operations lengthen strands, some shorten them, and some leave their length alone.

Operations come in packets-that is, several to be performed together, in order. Such a packet of operations is a little like a programmed machine which moves up and down the strand doing things to it. These mobile machines are called "typographical enzymes"-

enzymes for short. Enzymes operate on strands one unit at a time, and are said to be "bound"

to the unit they are operating on at any given moment.

I will show how some sample enzymes act on particular strings. The first thing to know is that each enzyme likes to start out bound to a particular letter. Thus, there are four kinds of enzyme-those which prefer

A
, those which prefer
C
, etc. Given the sequence of operations which an enzyme performs, you can figure out which letter it prefers, but for now I'll just give them without explanation.

Here's a sample enzyme, consisting of three operations:

(1) Delete the unit to which the enzyme is bound (and then bind to the next unit to the right).

(2) Move one unit to the right.

(3) Insert a T (to the immediate right of this unit).

This enzyme happens to like to bind to A initially. And here's a sample strand:
ACA

What happens if our enzyme binds to the left
A
and begins acting? Step I deletes the A, so we are left with
CA
-and the enzyme is now bound to the
C
. Step 2 slides the enzyme rightwards, to the
A
, and Step 3 appends a
T
onto the end to form the strand
CAT
. And the enzyme has done its complete duty: it has transformed
ACA
into
CAT
.

What if it had bound itself to the right A of ACA? It would have deleted that A and moved off the end of the strand. Whenever this happens, the enzyme quits (this is a general principle). So the entire effect would just be to lop off one symbol.

Let's see some more examples. Here is another enzyme:

(1) Search for the nearest pyrimidine to the right of this unit.

(2) Go into Copy mode.

(3) Search for the nearest purine to the right of this unit.

(4) Cut the strand here (viz., to the right of the present unit).

Now this contains the terms "pyrimidine" and "purine". They are easy terms.
A
and
G
are called
purines
, and
C
and
T
are called
pyrimidines
. So searching for a pyrimidine merely means searching for the nearest
C
or
T.

Copy Mode and Double Strands

The other new term is
Copy mode
. Any strand can be "copied" onto another strand, but in a funny way. Instead of copying
A
onto
A,
you copy it onto
T
, and vice versa. And instead of copying
C
onto
C
, you copy it onto
G
, and vice versa. Note that a purine copies onto a pyrimidine, and vice versa. This is called
complementary base pairing
. The complements are shown below

Complement

Purinas
A
<====>
T
pyrimidines

G
<====>
C

You can perhaps remember this molecular pairing scheme by recalling that
A
chilles is paired with the
T
ortoise, and the
C
rab with his
G
enes.

When "copying" a strand, therefore, you don't actually copy it, but you manufacture its
complementary
strand. And this one will be written upside down above the original strand.

Let's see this in concrete terms. Let the previous enzyme act on the following strand (and that enzyme also happens to like to start at
A
):

CAAAGAGAATCCTCTTTGAT

There are many places it could start. Let's take the second A, for example. The enzyme binds to it, then executes step 1: Search for the nearest pyrimidine to the right. Well, this means a
C

or a
T.
The first one is a T somewhere near the middle of the strand, so that's where we go.

Now step 2: Copy mode. Well, we just put an upside-down
A
above our
T
. But that's not all, for Copy mode
remains in effect
until it is shut off-or until the enzyme is done, whichever comes first. This means that every base which is passed through by the enzyme while Copy mode is on will get a complementary base put above it. Step 3 says to look for a purine to the right of our
T.
That is the
G
two symbols in from the right-hand end. Now as we move up to that
G
, we must "copy"-that is, create a complementary strand. Here's what that gives: (editor’ s note, I can’t print upside down ie it is too much hard work so V = A and D = G

upside down)

The last step is to cut the strand. This will yield two pieces:

VDDVDVVVJ

CAAAGAGAATCCTCTTTG

and
AT
.

And the instruction packet is done. We are left with a double strand, however. Whenever this happens, we separate the two complementary strands from each other (general principle); so in fact our end product is a set of three strands:

AT, CAAAGAGGA, and CAAAGAGAATCCTCTTTG

Notice that the upside-down strand has been turned right side up, and thereby right and left have been reversed.

Now you have seen most of the typographical operations which can be carried out on strands. There are two other instructions which should be mentioned. One shuts off Copy mode; the other switches the enzyme from a strand to the upside-down strand above it. When this happens, if you keep the paper right side up, then you must switch "left" and "right" in all the instructions. Or better, you can keep the wording and just turn the paper around so the top strand becomes legible. If the "switch" command is

Other books

A Fragile Peace by Paul Bannister
A Creature of Moonlight by Rebecca Hahn
The Topaz Quest by Gill Vickery
Reunion by Laura Harner
Gods and Beasts by Denise Mina
B000FC1MHI EBOK by Delinsky, Barbara
The Magehound by Cunningham, Elaine
Echoes of Lies by Jo Bannister