Read Structure and Interpretation of Computer Programs Online
Authors: Harold Abelson and Gerald Jay Sussman with Julie Sussman
Exercise 5.50.
Use the compiler to compile the metacircular evaluator of
section
4.1
and run this program using the register-machine
simulator. (To compile more than one definition at a time, you can
package the definitions in a
begin
.) The resulting interpreter
will run very slowly because of the multiple levels of interpretation,
but getting all the details to work is an instructive exercise.
Exercise 5.51.
Develop a rudimentary implementation of Scheme in C (or some other
low-level language of your choice) by translating the explicit-control
evaluator of section
5.4
into C. In order to run this code
you will need to also
provide appropriate storage-allocation routines and other run-time
support.
Exercise 5.52.
As a counterpoint to exercise
5.51
, modify the compiler
so that it compiles Scheme procedures into sequences of C
instructions. Compile the metacircular evaluator of
section
4.1
to produce a Scheme interpreter written in C.
33
This is a theoretical statement. We are not claiming
that the evaluator's data paths are a particularly convenient or
efficient set of data paths for a general-purpose computer. For example,
they are not very good for implementing high-performance floating-point
calculations or calculations that intensively manipulate bit vectors.
34
Actually, the machine that runs
compiled code can be simpler than the interpreter machine, because we
won't use the
exp
and
unev
registers. The interpreter
used these to hold pieces of unevaluated expressions. With the
compiler, however, these expressions get built into the
compiled code that the register machine will run. For the same
reason,
we don't need the machine operations that deal with expression
syntax. But compiled code will use a few additional machine
operations (to represent compiled procedure objects) that didn't
appear in the explicit-control evaluator machine.
35
Notice, however, that our
compiler is a Scheme program, and the syntax procedures that it uses
to manipulate expressions are the actual Scheme procedures used with
the metacircular evaluator. For the explicit-control evaluator, in
contrast, we assumed that equivalent syntax operations were available
as operations for the register machine. (Of course, when we simulated
the register machine in Scheme, we used the actual Scheme procedures
in our register machine simulation.)
36
This procedure uses a feature of Lisp called
backquote
(or
quasiquote
) that is handy for constructing lists.
Preceding a list with a backquote symbol is much like quoting it,
except that anything in the list that is flagged with a comma is evaluated.
For example, if the value of
linkage
is the symbol
branch25
, then the expression
`((goto (label ,linkage)))
evaluates to the list
((goto (label branch25)))
.
Similarly, if the value of
x
is the list
(a b c)
, then
`(1 2 ,(car x))
evaluates to the list
(1 2 a)
.
37
We can't just
use the labels
true-branch
,
false-branch
, and
after-if
as shown above,
because there might be more than one
if
in the program.
The compiler uses the procedure
make-label
to generate labels.
Make-label
takes a symbol as argument and returns a new symbol
that begins with the given symbol. For example, successive calls to
(make-label 'a)
would return
a1
,
a2
, and so on.
Make-label
can be implemented similarly to the generation of
unique variable names in the query language, as follows:
(define label-counter 0)
(define (new-label-number)
(set! label-counter (+ 1 label-counter))
label-counter)
(define (make-label name)
(string->symbol
(string-append (symbol->string name)
(number->string (new-label-number)))))
38
We need machine operations to implement a data
structure for representing compiled procedures, analogous to the structure for
compound procedures described in section
4.1.3
:
(define (make-compiled-procedure entry env)
(list 'compiled-procedure entry env))
(define (compiled-procedure? proc)
(tagged-list? proc 'compiled-procedure))
(define (compiled-procedure-entry c-proc) (cadr c-proc))
(define (compiled-procedure-env c-proc) (caddr c-proc))
39
Actually, we signal an error when the target is not
val
and the linkage is
return
, since
the only place we request
return
linkages is in compiling
procedures, and our convention is that procedures return their
values in
val
.
40
Making a
compiler generate tail-recursive code might seem like a
straightforward idea. But most compilers for common languages,
including C and Pascal, do not do this, and therefore these languages
cannot represent iterative processes in terms of procedure call alone.
The difficulty with
tail recursion in these languages is that their
implementations use the stack to store procedure arguments and local
variables as well as return addresses. The Scheme implementations
described in this book store arguments and variables in memory to be
garbage-collected. The reason for using the stack for variables and
arguments is that it avoids the need for garbage collection in
languages that would not otherwise require it, and is generally
believed to be more efficient. Sophisticated Lisp compilers can, in
fact, use the stack for arguments without destroying tail recursion.
(See
Hanson 1990 for a description.) There is also some debate about
whether stack allocation is actually more efficient than garbage
collection in the first place, but the details seem to hinge on fine
points of computer architecture. (See
Appel 1987 and
Miller and Rozas
1994 for opposing views on this issue.)
41
The variable
all-regs
is bound to the list of names of all the registers:
(define all-regs '(env proc val argl continue))
42
Note that
preserving
calls
append
with three
arguments. Though the definition of
append
shown in this book
accepts only two arguments, Scheme standardly provides an
append
procedure that takes an arbitrary number of arguments.
43
We have used
the same symbol
+
here to denote both the source-language
procedure and the machine operation. In general there will not be a
one-to-one correspondence between primitives of the source language
and primitives of the machine.
44
Making the primitives into reserved
words is in general a bad idea, since a user cannot then rebind these
names to different procedures. Moreover, if we add reserved words to
a compiler that is in use, existing programs that define procedures
with these names will stop working. See
exercise
5.44
for ideas on how to avoid this
problem.
45
This is not true if we allow
internal definitions, unless we scan them out.
See exercise
5.43
.
46
This is the modification to variable lookup
required if we implement the scanning method to eliminate internal
definitions (exercise
5.43
). We will need
to eliminate these definitions in order for lexical addressing to
work.
47
Lexical addresses cannot be used to access variables in the global
environment, because these names can be defined and redefined
interactively at any time. With internal definitions scanned out, as
in exercise
5.43
, the only definitions the
compiler sees are those at top level, which act on the global
environment. Compilation of a definition does not cause the defined
name to be entered in the compile-time environment.
48
Of course, compiled procedures as well as interpreted
procedures are compound (nonprimitive). For compatibility with
the terminology used in the explicit-control evaluator, in this
section we will use “compound” to mean interpreted (as opposed
to compiled).
49
Now that the evaluator machine starts
with a
branch
, we must always initialize the
flag
register
before starting the evaluator machine. To start the machine at
its ordinary read-eval-print loop, we could use
(define (start-eceval)
(set! the-global-environment (setup-environment))
(set-register-contents! eceval 'flag false)
(start eceval))
50
Since a compiled procedure is an
object that the system may try to print, we also modify the system
print operation
user-print
(from section
4.1.4
)
so that it will not attempt to print the
components of a compiled procedure:
(define (user-print object)
(cond ((compound-procedure? object)
(display (list 'compound-procedure
(procedure-parameters object)
(procedure-body object)
'
((compiled-procedure? object)
(display '
(else (display object))))
51
We can do even better by extending the compiler
to allow compiled code to call interpreted procedures. See
exercise
5.47
.
52
Independent of the strategy of execution, we
incur significant overhead if we insist that errors encountered in
execution of a user program be detected and signaled, rather than being
allowed to kill the system or produce wrong answers. For example, an
out-of-bounds array reference can be detected by checking the validity
of the reference before performing it. The overhead of checking,
however, can be many times the cost of the array reference itself, and
a programmer should weigh speed against safety in determining whether
such a check is desirable. A good compiler should be able to produce
code with such checks, should avoid redundant checks, and should allow
programmers to control the extent and type of error checking in the
compiled code.
Compilers for popular languages, such as C and C++,
put hardly any error-checking operations into
running code, so as to make things run as fast as possible. As a
result, it falls to programmers to explicitly provide error checking.
Unfortunately, people often neglect to do this, even in
critical applications where speed is not a constraint. Their programs
lead fast and dangerous lives. For example, the notorious
“Worm”
that paralyzed the Internet in 1988 exploited the
UNIX
T
M
operating system's failure to check whether the input buffer has
overflowed in the finger daemon. (See Spafford 1989.)
53
Of course, with either the
interpretation or the compilation strategy we must also implement for
the new machine storage allocation, input and output, and all the
various operations that we took as “primitive” in our discussion of
the evaluator and compiler. One strategy for minimizing work here is
to write as many of these operations as possible in Lisp and then
compile them for the new machine. Ultimately, everything reduces to a
small kernel (such as garbage collection and the mechanism for
applying actual machine primitives) that is hand-coded for the new
machine.
54
This strategy leads to amusing tests of correctness of
the compiler, such as checking
whether the compilation of a program on the new machine, using the
compiled compiler, is identical with the
compilation of the program on the original Lisp system. Tracking
down the source of differences is fun but often frustrating, because
the results are extremely sensitive to minuscule details.
Abelson, Harold, Andrew Berlin, Jacob Katzenelson,
William McAllister,
Guillermo Rozas, Gerald Jay Sussman, and Jack Wisdom. 1992. The
Supercomputer Toolkit: A general framework for special-purpose
computing.
International Journal of High-Speed Electronics
3(3):337-361.
Allen, John. 1978.
Anatomy of Lisp.
New York: McGraw-Hill.
ANSI X3.226-1994.
American National Standard for Information
Systems – Programming Language – Common Lisp.
Appel, Andrew W. 1987. Garbage collection can be faster than stack
allocation.
Information Processing Letters
25(4):275-279.
Backus, John. 1978. Can programming be liberated from the von
Neumann style?
Communications of the ACM
21(8):613-641.
Baker, Henry G., Jr. 1978. List processing in real time on a serial
computer.
Communications of the ACM
21(4):280-293.
Batali, John, Neil Mayle, Howard Shrobe, Gerald Jay Sussman, and
Daniel Weise. 1982. The Scheme-81 architecture – System and chip.
In
Proceedings of the MIT Conference on Advanced Research in
VLSI,
edited by Paul Penfield, Jr. Dedham, MA: Artech House.