J Strother Moore

Robert S. Boyer

Grant O. Passmore

March, 2018

The first general-purpose automatic theorem prover explicitly designed for
the mathematics behind program verification was
the *Edinburgh Pure Lisp Theorem Prover* otherwise known as
*PLTP*. PLTP was created in 1972 and 1973 by Boyer and me while we were
in the Department of Computational Logic (formerly, the Metamathematics
Unit), School of Artificial Intelligence, University of Edinburgh, Scotland.
It was a direct response to the inadequacies of the then-popular
resolution-based first-order provers (upon which we had been working) for
program verification purposes.
The Annotated Bibliography below
provides scanned images of the 1973 POP-2 code for PLTP, its input, and its
output, as well as links to other relevant material including POP-2 reference
manuals, my dissertation, and articles about PLTP published in 1973 and 1975.
In addition, we provide two modern (2017) recreations of PLTP, one in ACL2
and the other in OCaml. But first I would like to describe the origins of
PLTP and of this archive.

For example,
we could prove that there exists an `X`

such that the recursively defined
`(LENGTH X)`

is equal to `3`

. But we could not prove
that `(LENGTH (APPEND X Y))`

is `(+ (LENGTH X) (LENGTH Y))`

.
The latter conjecture requires *mathematical induction*.

Our focus on proving properties of programs as a test bed for our efforts in resolution led us by 1972 to the realization that we needed to build a prover capable of induction.

The standard approach to program verification in the early 1970s was to (a)
annotate a program with assertions, (b) use one of several methods to
generate ``verification condition'' formulas, and then (c) use an *ad
hoc* simplifier or resolution theorem prover to attempt to prove those
formulas. It was not uncommon for researchers of the day to add additional
``lemmas'' or specially tailored ``axioms'' to the set of input assumptions
in order for the prover to succeed.

The reason for this, as we pointed out in the Preface of our 1979 book
*A Computational Logic* (Academic Press):

One reason that researchers have had to assume ``lemmas'' so freely is that they have not implemented the principle of mathematical induction in their theorem-proving systems. Since mathematical induction is a fundamental rule of inference for the objects about which computer programmers think (e.g., integers, sequences, trees), it is surprising that anyone would implement a theorem-prover for program verification that could not make inductive arguments.We speculated on the possibility that researchers (unconsciously) assumed that the only roles for induction was to unwind loops and handle procedure calls — roles fulfilled by the program semantics. But regardless of why, it is a fact induction was not supported even though many verification conditions involved properties of inductively constructed objects.

Thus 1972 found us proving theorems about recursive list processing functions, by hand on the blackboard, and discussing the techniques and heuristics we were using. The main questions were

- When did we decide to use induction?
- How did we decide what to induct upon?
- How did we formulate the induction hypotheses?
- How did we simplify the induction conclusions so we could appeal to the induction hypotheses?

These exercises soon led us to implement our methods in the Edinburgh Pure
Lisp Theorem Prover, PLTP. Briefly put, PLTP was a fully automatic theorem
prover for properties of recursive functions over list structures. Since our
focus was on induction, we limited the language to a homegrown version of
Pure Lisp supporting just the object
`NIL`

and ordered pairs constructed by `CONS`

. The
language included `CAR`

and `CDR`

, a
3-place `IF`

(actually named ```COND`

'' in early work),
and `EQUAL`

. Natural numbers were just abbreviations for lists
of `NIL`

s giving rise to Peano-like definitions of arithmetic.
The user could define recursive functions and challenge the prover to prove
properties. It was not possible for the user to offer hints, lemmas, or
other help.

In this setting we could explore automatic proofs of the most elementary results of elementary number theory as well as properties sequences and trees and of functions that mixed those theories, e.g., we could pursue the kinds of proofs required by program verification.

Roughly speaking, PLTP took an input formula (as a term)
and *simplified* it by repeatedly applying ``symbolic evaluation''
(see `EVALUATE`

in
listings-f-and-h), ``rewriting''
(see `REWRITE`

) and ``reduction'' (see `REDUCE`

) until
it no longer changed. After
simplification, PLTP applied ``fertilization'' (see `FERTILIZE`

),
``generalization'' (see `GENRLIZE`

), and induction (see `INDUCT`

).
Fertilization replaced one side of an equality induction hypothesis by the
other in certain occurrences in the induction conclusion and ``discarded''
the hypothesis. Generalization replaced some terms by new variables,
possibly restricting the values of the new variables to satisfy certain automatically synthesized
``type functions.'' Finally, selection of an induction schema was driven by
the failures of symbolic evaluation to expand calls of recursive functions in
the conjecture.

We structured the 1973 PLTP somewhat differently than our subsequent NQTHM and ACL2 provers. Let me digress to point out some design choices that readers familiar with our more recent provers may find surprising.

- PLTP did not operate on clauses but on terms typically representing conjunctions held together by
`IF`

s. That is, all of the subgoals of a proof were expressed in a single term. We didn't introduce clauses, the ``waterfall,'' and the ``pool'' until the prover described in*A Computational Logic*(Academic Press, 1979). - The PLTP process of repeatedly
applying
`EVALUATE`

,`REWRITE`

, and`REDUCE`

until the result was stable was called by the non-word ``normalation'' in the early copies of Moore's dissertation and in the 14 September, 1973, source code (see the definition of`NORMALATE`

on page 14 of Listing-F). However, Moore's PhD examining committee (Bernard Meltzer, Rod Burstall, Robin Milner, and David Cooper) asked us to name the process with an English word. We chose ``simplification'' and we defined the function`SIMPLIFY`

in subsequent versions of PLTP. See, for example, page 18 of Listing-A, printed 2 November, 1973. - Unlike NQTHM and ACL2, PLTP's simplifier had no provision for
user-suggested lemmas. PLTP heuristically expanded calls of defined
functions (using some of the same heuristics used today) and applied rewrite
rules about the six primitives, e.g.,
(CAR (CONS X Y)) = X (IF X Y Y) = Y

But there was no way to store or use rules derived from previously proved theorems. - In PLTP, symbolic evaluation (
`EVALUATE`

) used side-effects to maintain a group of global and locally-bound ``special'' variables to be used by induction. The main such variable was named`ANALYSIS`

and it accumulated ``bomb lists'' recording recursive calls that blocked the symbolic expansion of recursive functions.`ANALYSIS`

was accessed during induction processing by`PICKINDVARS`

. - PLTP could only do structural induction. For example (using POP-2's
square bracket notation for s-expressions), PLTP's inductive argument to prove
`[`

when*p*X]`X`

is a linear list was:[AND [

Our subsequent provers used a schema like this for linear lists:*p*NIL] [IMPLIES [*p*X] [*p*[CONS X1 X]]]](AND (IMPLIES (NOT (CONSP X)) (

Obvious generalizations of this latter scheme allow one to provide induction hypotheses about arbitrary objects smaller than*p*X)) (IMPLIES (AND (CONSP X) (*p*(CDR X))) (*p*X)))`X`

, not just explicit structural components of`X`

. - Finally, PLTP collected fertilization, generalization, and induction into
a routine that was facetiously given the name
`ARTIFINTEL`

(for ``artificial intelligence'') because:COMMENT 'THIS FUNCTION APPLIES FERTILIZATION AND IF THAT FAILS TRIES GENERALIZING AND INDUCTING. IT IS CAREFUL TO WORK ONLY ON THE FIRST CONJUNCT IF THE THEOREM IS A CONJUNCT. FOR THIS IT GETS THE NAME "ARTIFICIAL INTELLIGENCE", BEING ABOUT THE SMARTEST PROGRAM IN THE THEOREM PROVER.`;

In our subsequent provers, in which the goal was represented by the implicit conjunction of clauses in the pool, we generalized this architecture and collected simplification, fertilization, generalization, and (eventually other processes like destructor elimination and irrelevance elimination) into the waterfall with induction at the bottom. Since PLTP's`ARTIFINTEL`

focused generalization and induction on the first conjunct, PLTP acted as though it had a pool and (small) waterfall. Unlike our subsequent provers, PLTP eagerly simplified and fertilized all the formulas in the pool before resorting to generalization, etc.

1973 was a very active time and many different experimental versions of the
system were tried, mostly involving minor tweaks to heuristics. To maintain
our sanity we had a regression suite of function definitions and theorems,
called *the proveall*, and ran the modified code on the proveall every
time we changed the system.

By the end of 1973 we had a reasonably large corpus of Pure Lisp definitions
and properties in our proveall regression suite. Among the properties proved
fully automatically were that list concatenation is associative, reverse is
its own inverse, and insertion sort is correct. The last was stated as two
theorems, that insertion sort produces ordered output and that, for
all *x*, the number of times *x* occurs in the input is equal to
the number of times *x* occurs in the output. Images of lineprinter
listings of PLTP's proveall and the proof output are included in the
bibliography below. (By the way, the second half of the correctness of
insertion sort — the part about `SORT`

preserving the number
of occurrences — is not proved in the 18 July, 1973 proveall listing we
have, indeed it is marked with a `*`

in the input, indicating that
it was a challenge we were working on. It is listed among the successful
theorems in the dissertation some months later.)

Here for example is 1973 PLTP proof that `SORT`

produced `ORDERED`

output.
The HTML text below was manually transcribed from pages 127-130 of the
proveall output of 18 July,
1973. I may have inadventently introduced typos. I have deleted some
blank lines.

Definitions of non-primitives used in the proof were printed at the end of
the proof log. Note than `IMPLIES`

and `AND`

were
non-primitives: our logical connector was the 3-place `COND`

(subsequently renamed `IF`

). `ADDTOLIS`

is the
user-defined insertion function used by
`SORT`

; `LTE`

is the user-defined Peano ``≤''
operator. The proof below shows PLTP doing two generalizations and three
inductions — a straightforward induction on a linear list, a
``special'' induction with two base cases, and an induction on two variables
simultaneously. The proof shows PLTP ``discovering'' the lemmas that
insertion preserves orderedness and that
*i ≤ j* or *j ≤ i*.

[T 6 8] [ 16.22 18 JULY 1973] THEOREM TO BE PROVED: [ORDERED [SORT A]] MUST TRY INDUCTION. INDUCT ON A. THE THEOREM TO BE PROVED IS NOW: [AND [ORDERED [SORT NIL]] [IMPLIES [ORDERED [SORT A]] [ORDERED [SORT [CONS A1 A]]]]] WHICH IS EQUIVALENT TO: [COND [ORDERED [SORT A]] [ORDERED [ADDTOLIS A1 [SORT A]]] T] GENERALIZE COMMON SUBTERMS BY REPLACING [SORT A] BY GENRL1. THE GENERALIZED TERM IS: [COND [ORDERED GENRL1] [ORDERED [ADDTOLIS A1 GENRL1]] T] MUST TRY INDUCTION. (SPECIAL CASE REQUIRED) INDUCT ON GENRL1. THE THEOREM TO BE PROVED IS NOW: [AND [COND [ORDERED NIL] [ORDERED [ADDTOLIS A1 NIL]] T] [AND [COND [ORDERED [CONS GENRL11 NIL]] . [ORDERED [ADDTOLIS A1 [CONS GENRL11 NIL]]] . T] [IMPLIES [COND [ORDERED [CONS GENRL12 GENRL1]] [ORDERED [ADDTOLIS A1 [CONS GENRL12 GENRL1]]] T] [COND [ORDERED [CONS GENRL11 [CONS GENRL12 GENRL1]]] [ORDERED [ADDTOLIS A1 [CONS GENRL11 [CONS GENRL12 GENRL1]]]] T]]]] WHICH IS EQUIVALENT TO: [COND [LTE A1 GENRL11] T [LTE GENRL11 A1]] MUST TRY INDUCTION. INDUCT ON GENRL11 AND A1. THE THEOREM TO BE PROVED IS NOW: [AND [AND [COND [LTE A1 NIL] T [LTE NIL A1]] [COND [LTE NIL GENRL11] T [LTE GENRL11 NIL]]] [IMPLIES [COND [LTE A1 GENRL11] T [LTE GENRL11 A1]] [COND [LTE [CONS A11 A1] [CONS GENRL111 GENRL11]] T [LTE [CONS GENRL111 GENRL11] [CONS A11 A1]]]]] WHICH IS EQUIVALENT TO: T FUNCTION DEFINITIONS: [SORT [LAMBDA [X] [COND X [ADDTOLIS [CAR X] [SORT [CDR X]]] NIL]]] [ORDERED [LAMBDA [X] [COND X [COND [CDR X] [COND [LTE [CAR X] [CAR [CDR X]]] [ORDERED [CDR X]] NIL] T] T]]] [LTE [LAMBDA [X Y] [COND X [COND Y [LTE [CDR X] [CDR Y]] NIL] T]]] [ADDTOLIS [LAMBDA [X Y] [COND Y [COND [LTE X [CAR Y]] [CONS X Y] [CONS [CAR Y] [ADDTOLIS X [CDR Y]]]] [CONS X NIL]]]] [IMPLIES [LAMBDA [X Y] [COND X [COND Y T NIL] T]]] [AND [LAMBDA [X Y] [COND X [COND Y T NIL] NIL]]] GENERALIZATIONS: GENRL1 = [SORT A] PROFILE: [/ [A] , / E N R / E N R G S1 [GENRL1] , / E N R / E N R / E N R / E N R / E N R [GENRL11 A1] , / E N R / E N R .] TIME: 57.06 SECS.

Our choice of Pure Lisp as the logic in which to conduct our proofs was
inspired by John
McCarthy's A Basis for a
Mathematical Theory of Computation. Indeed, we were so enamored with the
idea of LISP as a logic that PLTP used `IF`

to represent the
conjunction of subgoals. The style in which we organized PLTP was inspired
by similar provers by Bob Boyer's dissertation advisor, W. W. Bledsoe.

We implemented PLTP in POP-2, a imperative Lisp-like programming language with an Algol-like syntax developed at the University of Edinburgh by Robin Popplestone and Rod Burstall. POP-2 was the language supported by the computing platform available to us in 1973: an ICL 4130 processor with 64K bytes of 2 microsecond memory shared between 8 interactive jobs controlled by teletype input from the users. File input and output was by paper tape and lineprinter. The modern reader may have a hard time imagining theorem proving research conducted under such constraints!

We regarded PLTP as an experimental vehicle for exploring inductive proofs.
We were the only users and our computing resources were extremely limited.
We therefore took many shortcuts that we abandoned as our subsequent provers
attracted other users and computing resources grew exponentially. The most
egregious shortcut was that the prover did not check that the s-expression
provided as input was well-formed. It was just implicit that we'd call the
prover only on formulas! There were many such implicit understandings, e.g.,
no definitional principle was implemented, some variable symbols were
implicitly reserved for introduction by the prover, variable symbols
beginning with the letters `I`

through `N`

were
implicitly treated as numeric, etc. In addition, we inevitably made coding
mistakes, some of which we found and fixed and others of which apparently
escaped our notice until we assembled this archive! We include a discussion
of all ``bugs'' discovered in the 14 September, 1973, listings below.
Importantly in this regard, our whole understanding of what constituted a
``bug'' in a theorem prover has evolved in the 45 years from PLTP's origin as
a experimental system used by two people to the adoption of theorem provers
and proof assistants as trusted industrial tools.

A brief paper about PLTP, *Proving Theorems about LISP Functions* was
presented at the *Third International Joint Conference on Artificial
Intelligence* (IJCAI-73), 20-23 August 1973, at Stanford University. A
longer version of the paper, with the same title, subsequently appeared in
the *JACM* in 1975. Proper citations are given below.

PLTP was the beginning of a sequence of Boyer-Moore-Kaufmann provers for
applicative Lisp: the unnamed prover described in *A Computational
Logic*, Boyer and Moore, Academic Press, 1979, as well
as NQTHM
and ACL2.

PLTP itself passed into history.

In the spring of 2017, while looking through old boxes of papers (trying to find the documentation for the 77 Editor for which Boyer and I invented the notion of ``piece table'' used by Charles Simonyi in the Xerox PARC text editor Bravo and subsequently in Microsoft Word) we found a complete listing of the source code for PLTP as of 14 September, 1973. We also found various 5-hole paper tapes purporting to contain various files from the era. But absent a 5-hole paper tape reader, our only readable record of PLTP's source code are the faded lineprinter listings from 1973. These include many versions of various component files and one complete listing of all relevant POP-2 source code files for PLTP made on 14 September, 1973. That listing was labeled in 1973 with the handwritten note ``the theorem prover as reported in J's thesis.''

We also found lineprinter listings of the proveall and the proveall output of a version of PLTP predating the dissertation by a couple of months.

In the summer of 2017, working directly from the original POP-2 sources I recoded PLTP in ACL2. However, I took certain liberties: I corrected certain bugs I discovered and refactored some subroutines for clarity without (intentionally) changing their behavior. While any computation can be mimicked in any Turing complete programming language like ACL2, I simply (gratuitously?) balked at trying to mimic exactly the methods of some PLTP routines — memory limitations of the 1970s drove us to short-cuts, convoluted coding style, the use of numerous side-effects to global variables and data structures, and use of the peculiar feature of POP-2 in which one could ``surreptitiously'' pass arguments to future subroutine calls by leaving ``extra'' results on a stack.

In my reconstruction I tended to use more straightforward functional programming idioms, like defining one ACL2 function to compute the results of a given POP-2 routine and another ACL2 function to compute the side-effects. All of this — including the bugs discovered and the ``gratuitous'' stylistic changes — is documented below in great detail.

We call the resulting prover
*PLTP(A)*. PLTP(A) focuses on what we call the *September core*:
the proof methods implemented as of 14 September, 1973. It ignores
extraneous issues like the command one would have invoked in 1973 to run the
regression suite, the code to reproduce exactly the way we prettyprinted
formulas in 1973 and the English proof narrative produced by PLTP. I then
``successfully'' ran the 1973 regression with PLTP(A). I quote
``successfully'' because two theorems failed, but I argue in the PLTP(A)
material below that PLTP could not have proved these theorems and they were
in the suite as challenges to work on.

We also undertook an ultimately unsuccessful effort to get the original POP-2
sources for PLTP to run on a modern machine. While no modern implementation
of POP-2 exists (as far as we're aware) Passmore found a binary image for a
Russian BESM-6 computer booted with a POP-2 image from the early 1970s and a
BESM-6 emulator. We call that system the *emulated Russian POP-2* here.

The emulated Russian POP-2 was quite difficult to work with — it is
undocumented, fragile, and has extremely limited memory resources (the BESM-6
had even less memory than the ICL 4130 on which PLTP ran). Certain legal
POP-2 programs cannot be compiled or are compiled or emulated
incorrectly. The compiler silently truncates identifiers to 6 characters
(whereas the 1973 Edinburgh compiler truncated to 8), thus causing some
invisible name collisions. The compiler also got stack overflows trying to
process even modestly nested POP-2 expressions and we had to refactor the
code with many jumps and assignments to get it to compile. We use the
name *PLTP(R)* for the PLTP code refactored to compile under the
emulated Russian POP-2. But even after apparently successful compilation,
PLTP(R) produced runtime errors or unexplained failures on some proof
attempts of theorems in the regression.

We ultimately gave up on this attempt since it was clear that, even if we could get the emulated Russian POP-2 to run the regression successfully on some modified version of the code it would not demonstrate how the original source code ran. The ACL2 and OCaml implementations, PLTP(A) and PLTP(O), are better illustrations of the capabilities of PLTP than PLTP(R).

- Moore's dissertation, Part I of which describes our work on structure sharing in resolution theorem proving,
including our Prolog-like language Baroque and its Lisp-like counterpart, and Part II of which fully describes
PLTP: Computational
Logic: Structure Sharing and Proof of Program Properties. PhD Thesis,
Department of Computational Logic, University of Edinburgh, 1973.
- The IJCAI-73 paper on
PLTP: Proving
Theorems about LISP Functions, R.S. Boyer and J S. Moore, Third
International Joint Conference on Artificial Intelligence (IJCAI-73),
Stanford University, 1973, pp. 486-493.
- The longer, archival publication of PLTP in
JACM: Proving
Theorems about LISP Functions, R.S. Boyer and J S. Moore.
*Journal of the Association for Computing Machinery*,**22**(1), 1975, pp.129-144. (Note: The last page of the pdf file lists the submission date as `September, 1993.' This is an obvious OCR error; the paper was submitted in September, 1973 and appeared in 1975!) - POP-2 Reference Manual by R.M. Burstall and R.J. Popplestone,
Department of Machine Intelligence and Perception, University of Edinburgh, 1972.
- POP-2/4100 User's Manual by
R. D. Dunn, School of Artificial Intelligence, University of Edinburgh, 1972.
- Scanned OCR Images of our 1973
lineprinter listings. These images are faded and sometimes hard to read
but are the definitive historical record. Among the listings are all the files
in the implementation of PLTP described in Moore's dissertation and the output
of a proveall regression run from several months earlier. If you want to see
what PLTP did in July, 1973, look at that!
**Warning:**The OCR data in these scans is often incorrect, e.g., 8's coded as B's, 3's etc.*Do not trust negative search results over these scans!* - A text file containing the POP-2
source code from 14 September, 1973. Listings F and H of the scans
contain everything needed to reconstruct the September core of the prover
described in Moore's dissertation. Because of the OCR errors in the scans we
undertook the manual transcription of these two listings into simple text
form. Barring transcription errors (which we have worked hard to eliminate)
this is a convenient, machine-readable, listing of the original POP-2 code
for PLTP as described in Moore's dissertation.
- Bugs and oddities in the 14 September,
1973, code for PLTP. These bugs were noticed in our recent (2017) inspection
of the code after 45 years of experience writing theorem provers. These
problems do not manifest themselves in any of the published examples of PLTP
proofs. Furthermore, inspection of the few source listings with creation
dates after 14 September, 1973, reveals that several of these bugs were
subsequently noticed and fixed by Boyer and Moore.
- Our 2017 re-implementations of PLTP:
- PLTP(A): a recoding of the PLTP sources into the applicative Common Lisp of ACL2.
- PLTP(O): an implementation of PLTP in OCaml based on Moore's dissertation and the 1975 JACM paper.
- PLTP(R): ultimately unsuccessful attempt to get modified PLTP source code to compile and run in the emulated Russian POP-2.

- PRETTY-PRINT by R. S. Boyer, Department of Computational Logic, Memo 64, University of Edinburgh, 1972(?) — Boyer's linear-time prettyprint algorithm that produces beautiful and ``exact'' output (in the sense of reliably using the full line width when necessary). Most prettyprinters use heuristics to guess when it is time to print on a new line. PLTP(R) is running Boyer's algorithm. PLTP(A) and PLTP(O) use the standard prettyprinting of their host systems, ACL2 and OCaml. It should be noted that ACL2's prettyprinter is a direct descendant of Boyer's algorithm modified to handle the more complex s-expressions of Common Lisp.