Read Structure and Interpretation of Computer Programs Online
Authors: Harold Abelson and Gerald Jay Sussman with Julie Sussman
The query system may seem to exhibit quite a bit of intelligence in
using the rules to deduce the answers to the queries above. Actually,
as we will see in the next section, the system is following a
well-determined algorithm in unraveling the rules. Unfortunately,
although the system works impressively in the
append
case, the
general methods may break down in more complex cases, as we will see
in section
4.4.3
.
Exercise 4.61.
The following rules implement a
next-to
relation that finds
adjacent elements of a list:
(rule (?x next-to ?y in (?x ?y . ?u)))
(rule (?x next-to ?y in (?v . ?z))
(?x next-to ?y in ?z))
What will the response be to the following queries?
(?x next-to ?y in (1 (2 3) 4))
(?x next-to 1 in (2 1 3 1))
Exercise 4.62.
Define rules to implement the
last-pair
operation of
exercise
2.17
, which returns a list containing the last
element of a nonempty list. Check your rules on queries such as
(last-pair (3) ?x)
,
(last-pair (1 2 3) ?x)
, and
(last-pair (2 ?x) (3))
.
Do your rules work correctly on queries such as
(last-pair ?x (3))
?
Exercise 4.63.
The following data base (see Genesis 4) traces the genealogy of the
descendants of Ada back to Adam, by way of Cain:
(son Adam Cain)
(son Cain Enoch)
(son Enoch Irad)
(son Irad Mehujael)
(son Mehujael Methushael)
(son Methushael Lamech)
(wife Lamech Ada)
(son Ada Jabal)
(son Ada Jubal)
Formulate rules such as “If
S
is the son of
F
, and
F
is the son of
G
, then
S
is the grandson of
G
”
and “If
W
is the wife of
M
, and
S
is the son of
W
, then
S
is the son of
M
” (which was supposedly
more true in biblical times than today) that will enable the query
system to find the grandson of Cain; the sons of Lamech; the grandsons
of Methushael.
(See exercise
4.69
for some rules to
deduce more complicated relationships.)
In section
4.4.4
we will present an
implementation of the query interpreter as a collection of procedures.
In this section we give an overview that explains the general
structure of the system independent of low-level implementation
details. After describing the implementation of the interpreter, we
will be in a position to understand some of its limitations and some
of the subtle ways in which the query language's logical operations
differ from the operations of mathematical logic.
It should be apparent that the query evaluator must perform some kind
of search in order to match queries against facts and rules in the
data base. One way to do this would be to implement the query system
as a nondeterministic program, using the
amb
evaluator of
section
4.3
(see
exercise
4.78
). Another possibility is to manage
the search with the aid of streams. Our implementation follows this
second approach.
The query system is organized around two central operations called
pattern matching
and
unification
. We first describe
pattern matching and explain how this operation, together with the
organization of information in terms of streams of frames, enables us
to implement both simple and compound queries. We next discuss
unification, a generalization of pattern matching needed to implement
rules. Finally, we show how the entire query interpreter fits
together through a procedure that classifies expressions in a manner
analogous to the way
eval
classifies expressions for the
interpreter described in section
4.1
.
A
pattern matcher
is a program that tests whether some datum
fits a specified pattern. For example, the data list
((a b) c (a
b))
matches the pattern
(?x c ?x)
with the pattern variable
?x
bound to
(a b)
. The same data list matches the pattern
(?x ?y ?z)
with
?x
and
?z
both bound to
(a b)
and
?y
bound to
c
. It also matches the pattern
((?x ?y) c (?x ?y))
with
?x
bound to
a
and
?y
bound
to
b
. However, it does not match the pattern
(?x a ?y)
,
since that pattern specifies a list whose second element is the symbol
a
.
The pattern matcher used by the query system takes as inputs a
pattern, a datum, and a
frame
that specifies bindings for
various pattern variables. It checks whether the datum matches the
pattern in a way that is consistent with the bindings already in the
frame. If so, it returns the given frame augmented by any bindings
that may have been determined by the match. Otherwise, it indicates
that the match has failed.
For example, using the pattern
(?x ?y ?x)
to match
(a b a)
given an empty frame will return a frame specifying that
?x
is
bound to
a
and
?y
is bound to
b
. Trying the match
with the same pattern, the same datum, and a frame specifying that
?y
is bound to
a
will fail. Trying the match with the
same pattern, the same datum, and a frame in which
?y
is bound
to
b
and
?x
is unbound will return the given frame
augmented by a binding of
?x
to
a
.
The pattern matcher is all the mechanism that is needed to process
simple queries that don't involve rules. For instance, to process the
query
(job ?x (computer programmer))
we scan through all assertions in the data base and select those that
match the pattern with respect to an initially empty frame. For each
match we find, we use the frame returned by the match to instantiate
the pattern with a value for
?x
.
The testing of patterns against frames is organized through the use of
streams. Given a single frame, the matching process runs through the
data-base entries one by one. For each data-base entry, the matcher
generates either a special symbol indicating that the match has failed
or an extension to the frame. The results for all the data-base
entries are collected into a stream, which is passed through a filter
to weed out the failures. The result is a stream of all the frames
that extend the given frame via a match to some assertion in the data
base.
67
In our system, a query takes an input stream of frames and performs
the above matching operation for every frame in the stream, as
indicated in figure
4.4
. That is, for each frame in
the input stream, the query generates a new stream consisting of all
extensions to that frame by matches to assertions in the data base.
All these streams are then combined to form one huge stream, which
contains all possible extensions of every frame in the input stream.
This stream is the output of the query.
To answer a simple query, we use the query with an input stream
consisting of a single empty frame. The resulting output stream
contains all extensions to the empty frame (that is, all answers to
our query). This stream of frames is then used to generate a stream
of copies of the original query pattern with the variables
instantiated by the values in each frame, and this is the stream that
is finally printed.
The real elegance of the stream-of-frames implementation is evident
when we deal with compound queries. The processing of compound
queries makes use of the ability of our matcher to demand that a match
be consistent with a specified frame. For example, to handle the
and
of two queries, such as
(and (can-do-job ?x (computer programmer trainee))
(job ?person ?x))
(informally, “Find all people who can do the job of a computer
programmer trainee”), we first find all entries that match the
pattern
(can-do-job ?x (computer programmer trainee))
This produces a stream of frames, each of which contains a binding for
?x
. Then for each frame in the stream we find all entries that
match
(job ?person ?x)
in a way that is consistent with the given binding for
?x
. Each
such match will produce a frame containing bindings for
?x
and
?person
. The
and
of two queries can be viewed as a series
combination of the two component queries, as shown in
figure
4.5
. The frames that pass through the first
query filter are filtered and further extended by the second query.
Figure
4.6
shows the analogous method for computing the
or
of two queries as a parallel combination of the two component
queries. The input stream of frames is extended separately by each
query. The two resulting streams are then merged to produce the final
output stream.
Even from this high-level description, it is apparent that the
processing of compound queries can be slow.
For example, since a query may produce more than one output frame for
each input frame, and each query in an
and
gets its input frames
from the previous query, an
and
query could, in the worst case,
have to perform a number of matches that is exponential in the number
of queries (see exercise
4.76
).
68
Though systems for handling only simple queries are quite practical,
dealing with complex queries is extremely difficult.
69
From the stream-of-frames viewpoint, the
not
of some query acts
as a filter that removes all frames for which the query can be
satisfied. For instance, given the pattern
(not (job ?x (computer programmer)))
we attempt, for each frame in the input stream, to produce extension
frames that satisfy
(job ?x (computer programmer))
. We remove
from the input stream all frames for which such extensions exist. The
result is a stream consisting of only those frames in which the
binding for
?x
does not satisfy
(job ?x (computer
programmer))
. For example, in processing the query