Read Structure and Interpretation of Computer Programs Online
Authors: Harold Abelson and Gerald Jay Sussman with Julie Sussman
The explicit-control evaluator of section
5.4
is a
register machine whose controller interprets Scheme programs. In this
section we will see how to run Scheme programs on a register machine
whose controller is not a Scheme interpreter.
The explicit-control evaluator machine is universal – it can carry out
any computational process that can be described in Scheme. The
evaluator's controller orchestrates the use of its data paths to
perform the desired computation. Thus, the evaluator's data paths are
universal: They are sufficient to perform any computation we desire,
given an appropriate controller.
33
Commercial general-purpose computers are register machines organized
around a collection of registers and operations that constitute
an efficient and convenient universal set of data paths.
The controller for a general-purpose machine is an interpreter for
a register-machine language like the one we have been using. This
language is called the
native language
of the machine, or simply
machine language
. Programs written in machine language are
sequences of instructions that use the machine's data paths.
For example, the
explicit-control evaluator's instruction sequence
can be thought of as a machine-language program for a general-purpose
computer rather than as the controller for a specialized interpreter
machine.
There are two common strategies for bridging the gap between
higher-level languages and register-machine languages. The
explicit-control evaluator illustrates the
strategy of interpretation. An interpreter written in the native
language of a machine configures the machine to execute programs
written in a language (called the
source language
) that may
differ from the native language of the machine performing the
evaluation. The primitive procedures of the source language are
implemented as a library of subroutines written in the native language
of the given machine. A program to be interpreted (called the
source program
) is represented as a data structure. The interpreter
traverses this data structure, analyzing the source program. As it
does so, it simulates the intended behavior of the source program by
calling appropriate primitive subroutines from the library.
In this section, we explore the alternative strategy of
compilation
. A compiler for a given source language and machine
translates a source program into an equivalent program (called the
object program
) written in the machine's native language. The
compiler that we implement in this section translates programs written
in Scheme into sequences of instructions to be executed using
the explicit-control evaluator machine's data paths.
34
Compared with interpretation, compilation can provide a great increase
in the efficiency of program execution, as we will explain below in
the overview of the compiler. On the other hand, an interpreter
provides a more powerful environment for interactive program
development and debugging, because the source program being executed
is available at run time to be examined and modified. In addition,
because the entire library of primitives is present, new programs can
be constructed and added to the system during debugging.
In view of the complementary advantages of compilation and
interpretation, modern program-development environments pursue a mixed
strategy. Lisp interpreters are generally organized so that
interpreted procedures and compiled procedures can call each other.
This enables a programmer to compile those parts of a program that are
assumed to be debugged, thus gaining the efficiency advantage of
compilation, while retaining the interpretive mode of execution for
those parts of the program that are in the flux of interactive
development and debugging. In
section
5.5.7
, after we have implemented
the compiler, we will show how to interface it with our interpreter to
produce an integrated interpreter-compiler development system.
Our compiler is much like our interpreter, both in its structure and in
the function it performs. Accordingly, the mechanisms used by the
compiler for analyzing expressions will be similar to those used by
the interpreter. Moreover, to make it easy to interface compiled and
interpreted code, we will design the compiler to generate code that
obeys the same conventions of
register usage as the interpreter: The
environment will be kept in the
env
register, argument lists
will be accumulated in
argl
, a procedure to be applied will be
in
proc
, procedures will return their answers in
val
,
and the location to which a procedure should return will be kept in
continue
.
In general, the compiler translates a source program into an object
program that performs essentially the same register operations as
would the interpreter in evaluating the same source program.
This description suggests a strategy for implementing a rudimentary
compiler: We traverse the expression in the same way the
interpreter does. When we encounter a register instruction that the
interpreter would perform in evaluating the expression, we do not
execute the instruction but instead accumulate it into a sequence. The
resulting sequence of instructions will be the object code. Observe
the
efficiency advantage of compilation over interpretation. Each
time the interpreter evaluates an expression – for example,
(f 84 96)
– it performs the work of
classifying the expression (discovering that this
is a procedure application) and testing for the end of the operand list
(discovering that there are two operands). With a
compiler, the expression is analyzed only once, when the
instruction sequence is generated at compile time. The object code
produced by the compiler contains only the instructions that evaluate
the operator and the two operands, assemble the argument list,
and apply the procedure (in
proc
) to the arguments (in
argl
).
This is the same kind of optimization we implemented in the
analyzing evaluator of section
4.1.7
.
But there are further opportunities to gain efficiency in compiled code.
As the interpreter runs, it follows a process that must be applicable
to any expression in the language. In contrast, a given segment of
compiled code is meant to execute some particular expression. This
can make a big difference, for example in the use of the stack to
save registers. When the interpreter evaluates an expression, it must
be prepared for any contingency. Before evaluating a subexpression,
the interpreter saves all
registers that will be needed later, because
the subexpression might require an arbitrary evaluation.
A compiler, on the other hand, can exploit the structure of the
particular expression it is processing to generate code that avoids
unnecessary stack operations.
As a case in point, consider the combination
(f 84 96)
. Before
the interpreter evaluates the operator of the combination, it prepares
for this evaluation by saving the registers containing the operands
and the environment, whose values will be needed later. The
interpreter then evaluates the operator to obtain the result in
val
, restores the saved registers, and finally moves the result from
val
to
proc
. However, in the particular expression we are
dealing with, the operator is the symbol
f
, whose evaluation is
accomplished by the machine operation
lookup-variable-value
,
which does not alter any registers. The compiler that we implement in
this section will take advantage of this fact and generate code that
evaluates the operator using the instruction
(assign proc (op lookup-variable-value) (const f) (reg env))
This code not only avoids the unnecessary saves and
restores but also assigns the value of the lookup directly to
proc
, whereas the interpreter would obtain the result in
val
and then move this to
proc
.
A compiler can also optimize access to the environment. Having
analyzed the code, the compiler can in many cases know in which frame
a particular variable will be located and access that frame directly,
rather than performing the
lookup-variable-value
search. We
will discuss how to implement such variable access in
section
5.5.6
. Until then, however, we will
focus on the kind of register and stack optimizations described above.
There are many other optimizations that can be performed by a
compiler, such as coding primitive operations “in line” instead of
using a general
apply
mechanism (see
exercise
5.38
); but we will not emphasize these here.
Our main goal in this section is to illustrate the compilation process
in a simplified (but still interesting) context.
In section
4.1.7
we modified our original
metacircular interpreter to separate analysis from execution. We
analyzed each expression to produce an execution procedure that took
an environment as argument and performed the required operations. In
our compiler, we will do essentially the same analysis. Instead of
producing execution procedures, however, we will generate sequences of
instructions to be run by our register machine.
The procedure
compile
is the top-level dispatch in the compiler.
It corresponds to the
eval
procedure of
section
4.1.1
, the
analyze
procedure of
section
4.1.7
, and the
eval-dispatch
entry point of the explicit-control-evaluator in
section
5.4.1
.
The compiler, like the interpreters, uses the
expression-syntax
procedures defined in section
4.1.2
.
35
Compile
performs a case
analysis on the syntactic type of the expression to be compiled. For
each type of expression, it dispatches to a specialized
code
generator
:
(define (compile exp target linkage)
(cond ((self-evaluating? exp)
(compile-self-evaluating exp target linkage))
((quoted? exp) (compile-quoted exp target linkage))
((variable? exp)
(compile-variable exp target linkage))
((assignment? exp)
(compile-assignment exp target linkage))
((definition? exp)
(compile-definition exp target linkage))
((if? exp) (compile-if exp target linkage))
((lambda? exp) (compile-lambda exp target linkage))
((begin? exp)
(compile-sequence (begin-actions exp)
target
linkage))
((cond? exp) (compile (cond->if exp) target linkage))
((application? exp)
(compile-application exp target linkage))
(else
(error "Unknown expression type -- COMPILE" exp))))
Compile
and the code generators that it calls take two arguments
in addition to the expression to compile. There is a
target
,
which specifies the register in which the compiled code is to return
the value of the expression. There is also a
linkage
descriptor
, which describes how the code resulting from the
compilation of the expression should proceed when it has finished its
execution. The linkage descriptor can require that the code do one of
the following three things:
For example, compiling the expression
5
(which is
self-evaluating) with a target of the
val
register and a
linkage of
next
should produce the instruction
(assign val (const 5))
Compiling the same expression with a linkage of
return
should
produce the instructions
(assign val (const 5))
(goto (reg continue))
In the first case, execution will continue with the next instruction
in the sequence. In the second case, we will return from a procedure
call. In both cases, the value of the expression will be placed into
the target
val
register.
Each code generator returns an
instruction sequence
containing
the object code it has generated for the expression. Code generation
for a compound expression is accomplished by combining the output from
simpler code generators for component expressions, just as
evaluation of a compound expression is accomplished by evaluating the
component expressions.
The simplest method for combining instruction sequences is a procedure
called
append-instruction-sequences
. It takes as arguments any
number of instruction sequences that are to be executed sequentially;
it appends them and returns the combined sequence. That is, if
<
s
e
q
1
> and <
s
e
q
2
> are sequences of instructions, then
evaluating
(append-instruction-sequences <
s
e
q
1
> <
s
e
q
2
>)