The PLTP Archive


J Strother Moore

reporting joint work with
Robert S. Boyer
and with contributions by
Grant O. Passmore
March, 2018

Abstract

The first general-purpose automatic theorem prover explicitly designed for the mathematics behind program verification was the Edinburgh Pure Lisp Theorem Prover otherwise known as PLTP. PLTP was created in 1972 and 1973 by Boyer and me while we were in the Department of Computational Logic (formerly, the Metamathematics Unit), School of Artificial Intelligence, University of Edinburgh, Scotland. It was a direct response to the inadequacies of the then-popular resolution-based first-order provers (upon which we had been working) for program verification purposes. The Annotated Bibliography below provides scanned images of the 1973 POP-2 code for PLTP, its input, and its output, as well as links to other relevant material including POP-2 reference manuals, my dissertation, and articles about PLTP published in 1973 and 1975. In addition, we provide two modern (2017) recreations of PLTP, one in ACL2 and the other in OCaml. But first I would like to describe the origins of PLTP and of this archive.

Edinburgh, 1972 — 1973

Boyer and I met in Edinburgh in September, 1971. Initially we worked together on resolution provers, specifically Bob Kowalski's SL-resolution and structure sharing in both linear and general resolution. This led us to explore the creation of a precursor to Prolog called Baroque which allowed one to use SL-resolution to interpret programs and to prove theorems about them (see page 70 of Moore's dissertation). We then moved from Baroque to a Lisp subset implemented in Baroque (see page 75 op. cit.). See Part I of my dissertation below for a report on the first year of our collaboration. (See the Annotated Bibliography below for proper citations.)

For example, we could prove that there exists an X such that the recursively defined (LENGTH X) is equal to 3. But we could not prove that (LENGTH (APPEND X Y)) is (+ (LENGTH X) (LENGTH Y)). The latter conjecture requires mathematical induction.

Our focus on proving properties of programs as a test bed for our efforts in resolution led us by 1972 to the realization that we needed to build a prover capable of induction.

The standard approach to program verification in the early 1970s was to (a) annotate a program with assertions, (b) use one of several methods to generate ``verification condition'' formulas, and then (c) use an ad hoc simplifier or resolution theorem prover to attempt to prove those formulas. It was not uncommon for researchers of the day to add additional ``lemmas'' or specially tailored ``axioms'' to the set of input assumptions in order for the prover to succeed.

The reason for this, as we pointed out in the Preface of our 1979 book A Computational Logic (Academic Press):

One reason that researchers have had to assume ``lemmas'' so freely is that they have not implemented the principle of mathematical induction in their theorem-proving systems. Since mathematical induction is a fundamental rule of inference for the objects about which computer programmers think (e.g., integers, sequences, trees), it is surprising that anyone would implement a theorem-prover for program verification that could not make inductive arguments.
We speculated on the possibility that researchers (unconsciously) assumed that the only roles for induction was to unwind loops and handle procedure calls — roles fulfilled by the program semantics. But regardless of why, it is a fact induction was not supported even though many verification conditions involved properties of inductively constructed objects.

Thus 1972 found us proving theorems about recursive list processing functions, by hand on the blackboard, and discussing the techniques and heuristics we were using. The main questions were

These exercises soon led us to implement our methods in the Edinburgh Pure Lisp Theorem Prover, PLTP. Briefly put, PLTP was a fully automatic theorem prover for properties of recursive functions over list structures. Since our focus was on induction, we limited the language to a homegrown version of Pure Lisp supporting just the object NIL and ordered pairs constructed by CONS. The language included CAR and CDR, a 3-place IF (actually named ``COND'' in early work), and EQUAL. Natural numbers were just abbreviations for lists of NILs giving rise to Peano-like definitions of arithmetic. The user could define recursive functions and challenge the prover to prove properties. It was not possible for the user to offer hints, lemmas, or other help.

In this setting we could explore automatic proofs of the most elementary results of elementary number theory as well as properties sequences and trees and of functions that mixed those theories, e.g., we could pursue the kinds of proofs required by program verification.

Roughly speaking, PLTP took an input formula (as a term) and simplified it by repeatedly applying ``symbolic evaluation'' (see EVALUATE in listings-f-and-h), ``rewriting'' (see REWRITE) and ``reduction'' (see REDUCE) until it no longer changed. After simplification, PLTP applied ``fertilization'' (see FERTILIZE), ``generalization'' (see GENRLIZE), and induction (see INDUCT). Fertilization replaced one side of an equality induction hypothesis by the other in certain occurrences in the induction conclusion and ``discarded'' the hypothesis. Generalization replaced some terms by new variables, possibly restricting the values of the new variables to satisfy certain automatically synthesized ``type functions.'' Finally, selection of an induction schema was driven by the failures of symbolic evaluation to expand calls of recursive functions in the conjecture.

We structured the 1973 PLTP somewhat differently than our subsequent NQTHM and ACL2 provers. Let me digress to point out some design choices that readers familiar with our more recent provers may find surprising.

1973 was a very active time and many different experimental versions of the system were tried, mostly involving minor tweaks to heuristics. To maintain our sanity we had a regression suite of function definitions and theorems, called the proveall, and ran the modified code on the proveall every time we changed the system.

By the end of 1973 we had a reasonably large corpus of Pure Lisp definitions and properties in our proveall regression suite. Among the properties proved fully automatically were that list concatenation is associative, reverse is its own inverse, and insertion sort is correct. The last was stated as two theorems, that insertion sort produces ordered output and that, for all x, the number of times x occurs in the input is equal to the number of times x occurs in the output. Images of lineprinter listings of PLTP's proveall and the proof output are included in the bibliography below. (By the way, the second half of the correctness of insertion sort — the part about SORT preserving the number of occurrences — is not proved in the 18 July, 1973 proveall listing we have, indeed it is marked with a * in the input, indicating that it was a challenge we were working on. It is listed among the successful theorems in the dissertation some months later.)

Here for example is 1973 PLTP proof that SORT produced ORDERED output. The HTML text below was manually transcribed from pages 127-130 of the proveall output of 18 July, 1973. I may have inadventently introduced typos. I have deleted some blank lines.

Definitions of non-primitives used in the proof were printed at the end of the proof log. Note than IMPLIES and AND were non-primitives: our logical connector was the 3-place COND (subsequently renamed IF). ADDTOLIS is the user-defined insertion function used by SORT; LTE is the user-defined Peano ``≤'' operator. The proof below shows PLTP doing two generalizations and three inductions — a straightforward induction on a linear list, a ``special'' induction with two base cases, and an induction on two variables simultaneously. The proof shows PLTP ``discovering'' the lemmas that insertion preserves orderedness and that i ≤ j or j ≤ i.

[T 6 8]  [ 16.22 18 JULY 1973]

THEOREM TO BE PROVED:
[ORDERED [SORT A]]

MUST TRY INDUCTION.

INDUCT ON A.

THE THEOREM TO BE PROVED IS NOW:
[AND [ORDERED [SORT NIL]]
     [IMPLIES [ORDERED [SORT A]] [ORDERED [SORT [CONS A1 A]]]]]

WHICH IS EQUIVALENT TO:
[COND [ORDERED [SORT A]] [ORDERED [ADDTOLIS A1 [SORT A]]] T]

GENERALIZE COMMON SUBTERMS BY REPLACING [SORT A] BY GENRL1.

THE GENERALIZED TERM IS:
[COND [ORDERED GENRL1] [ORDERED [ADDTOLIS A1 GENRL1]] T]

MUST TRY INDUCTION.

(SPECIAL CASE REQUIRED)

INDUCT ON GENRL1.

THE THEOREM TO BE PROVED IS NOW:
[AND
 [COND [ORDERED NIL] [ORDERED [ADDTOLIS A1 NIL]] T]
 [AND
    [COND [ORDERED [CONS GENRL11 NIL]]
    .     [ORDERED [ADDTOLIS A1 [CONS GENRL11 NIL]]]
    .     T]
    [IMPLIES [COND [ORDERED [CONS GENRL12 GENRL1]]
                   [ORDERED [ADDTOLIS A1 [CONS GENRL12 GENRL1]]]
                   T]
             [COND [ORDERED [CONS GENRL11 [CONS GENRL12 GENRL1]]]
                   [ORDERED [ADDTOLIS A1 [CONS GENRL11 [CONS GENRL12 GENRL1]]]]
                   T]]]]

WHICH IS EQUIVALENT TO:

[COND [LTE A1 GENRL11] T [LTE GENRL11 A1]]

MUST TRY INDUCTION.

INDUCT ON GENRL11 AND A1.

THE THEOREM TO BE PROVED IS NOW:
[AND [AND [COND [LTE A1 NIL] T [LTE NIL A1]]
          [COND [LTE NIL GENRL11] T [LTE GENRL11 NIL]]]
     [IMPLIES [COND [LTE A1 GENRL11] T [LTE GENRL11 A1]]
              [COND [LTE [CONS A11 A1] [CONS GENRL111 GENRL11]]
                    T
                    [LTE [CONS GENRL111 GENRL11] [CONS A11 A1]]]]]

WHICH IS EQUIVALENT TO:
T

FUNCTION DEFINITIONS:
[SORT [LAMBDA [X] [COND X [ADDTOLIS [CAR X] [SORT [CDR X]]] NIL]]]

[ORDERED
 [LAMBDA
  [X]
  [COND
      X
      [COND [CDR X] [COND [LTE [CAR X] [CAR [CDR X]]] [ORDERED [CDR X]] NIL] T]
      T]]]

[LTE [LAMBDA [X Y] [COND X [COND Y [LTE [CDR X] [CDR Y]] NIL] T]]]

[ADDTOLIS
 [LAMBDA
    [X Y]
    [COND Y
          [COND [LTE X [CAR Y]] [CONS X Y] [CONS [CAR Y] [ADDTOLIS X [CDR Y]]]]
          [CONS X NIL]]]]

[IMPLIES [LAMBDA [X Y] [COND X [COND Y T NIL] T]]]

[AND [LAMBDA [X Y] [COND X [COND Y T NIL] NIL]]]

GENERALIZATIONS:
GENRL1 = [SORT A]

PROFILE: [/ [A] , / E N R / E N R G S1 [GENRL1] , / E N R / E N R / E N R / E N
R / E N R [GENRL11 A1] , / E N R / E N R .]

TIME: 57.06 SECS.

Our choice of Pure Lisp as the logic in which to conduct our proofs was inspired by John McCarthy's A Basis for a Mathematical Theory of Computation. Indeed, we were so enamored with the idea of LISP as a logic that PLTP used IF to represent the conjunction of subgoals. The style in which we organized PLTP was inspired by similar provers by Bob Boyer's dissertation advisor, W. W. Bledsoe.

We implemented PLTP in POP-2, a imperative Lisp-like programming language with an Algol-like syntax developed at the University of Edinburgh by Robin Popplestone and Rod Burstall. POP-2 was the language supported by the computing platform available to us in 1973: an ICL 4130 processor with 64K bytes of 2 microsecond memory shared between 8 interactive jobs controlled by teletype input from the users. File input and output was by paper tape and lineprinter. The modern reader may have a hard time imagining theorem proving research conducted under such constraints!

We regarded PLTP as an experimental vehicle for exploring inductive proofs. We were the only users and our computing resources were extremely limited. We therefore took many shortcuts that we abandoned as our subsequent provers attracted other users and computing resources grew exponentially. The most egregious shortcut was that the prover did not check that the s-expression provided as input was well-formed. It was just implicit that we'd call the prover only on formulas! There were many such implicit understandings, e.g., no definitional principle was implemented, some variable symbols were implicitly reserved for introduction by the prover, variable symbols beginning with the letters I through N were implicitly treated as numeric, etc. In addition, we inevitably made coding mistakes, some of which we found and fixed and others of which apparently escaped our notice until we assembled this archive! We include a discussion of all ``bugs'' discovered in the 14 September, 1973, listings below. Importantly in this regard, our whole understanding of what constituted a ``bug'' in a theorem prover has evolved in the 45 years from PLTP's origin as a experimental system used by two people to the adoption of theorem provers and proof assistants as trusted industrial tools.

A brief paper about PLTP, Proving Theorems about LISP Functions was presented at the Third International Joint Conference on Artificial Intelligence (IJCAI-73), 20-23 August 1973, at Stanford University. A longer version of the paper, with the same title, subsequently appeared in the JACM in 1975. Proper citations are given below.

PLTP was the beginning of a sequence of Boyer-Moore-Kaufmann provers for applicative Lisp: the unnamed prover described in A Computational Logic, Boyer and Moore, Academic Press, 1979, as well as NQTHM and ACL2.

PLTP itself passed into history.

Reconstructions of PLTP

Around 2014, Grant Passmore emailed Boyer and Moore that he had implemented PLTP in OCaml working entirely from Moore's dissertation and the 1975 JACM paper. Passmore's interest was primarily driven by a desire to understand our induction heuristics so that he could implement induction in the Imandra theorem prover of Aesthetic Integration, Ltd. Grant has kindly permitted us to link to his OCaml implementation of PLTP, which we call PLTP(O), below.

In the spring of 2017, while looking through old boxes of papers (trying to find the documentation for the 77 Editor for which Boyer and I invented the notion of ``piece table'' used by Charles Simonyi in the Xerox PARC text editor Bravo and subsequently in Microsoft Word) we found a complete listing of the source code for PLTP as of 14 September, 1973. We also found various 5-hole paper tapes purporting to contain various files from the era. But absent a 5-hole paper tape reader, our only readable record of PLTP's source code are the faded lineprinter listings from 1973. These include many versions of various component files and one complete listing of all relevant POP-2 source code files for PLTP made on 14 September, 1973. That listing was labeled in 1973 with the handwritten note ``the theorem prover as reported in J's thesis.''

We also found lineprinter listings of the proveall and the proveall output of a version of PLTP predating the dissertation by a couple of months.

In the summer of 2017, working directly from the original POP-2 sources I recoded PLTP in ACL2. However, I took certain liberties: I corrected certain bugs I discovered and refactored some subroutines for clarity without (intentionally) changing their behavior. While any computation can be mimicked in any Turing complete programming language like ACL2, I simply (gratuitously?) balked at trying to mimic exactly the methods of some PLTP routines — memory limitations of the 1970s drove us to short-cuts, convoluted coding style, the use of numerous side-effects to global variables and data structures, and use of the peculiar feature of POP-2 in which one could ``surreptitiously'' pass arguments to future subroutine calls by leaving ``extra'' results on a stack.

In my reconstruction I tended to use more straightforward functional programming idioms, like defining one ACL2 function to compute the results of a given POP-2 routine and another ACL2 function to compute the side-effects. All of this — including the bugs discovered and the ``gratuitous'' stylistic changes — is documented below in great detail.

We call the resulting prover PLTP(A). PLTP(A) focuses on what we call the September core: the proof methods implemented as of 14 September, 1973. It ignores extraneous issues like the command one would have invoked in 1973 to run the regression suite, the code to reproduce exactly the way we prettyprinted formulas in 1973 and the English proof narrative produced by PLTP. I then ``successfully'' ran the 1973 regression with PLTP(A). I quote ``successfully'' because two theorems failed, but I argue in the PLTP(A) material below that PLTP could not have proved these theorems and they were in the suite as challenges to work on.

We also undertook an ultimately unsuccessful effort to get the original POP-2 sources for PLTP to run on a modern machine. While no modern implementation of POP-2 exists (as far as we're aware) Passmore found a binary image for a Russian BESM-6 computer booted with a POP-2 image from the early 1970s and a BESM-6 emulator. We call that system the emulated Russian POP-2 here.

The emulated Russian POP-2 was quite difficult to work with — it is undocumented, fragile, and has extremely limited memory resources (the BESM-6 had even less memory than the ICL 4130 on which PLTP ran). Certain legal POP-2 programs cannot be compiled or are compiled or emulated incorrectly. The compiler silently truncates identifiers to 6 characters (whereas the 1973 Edinburgh compiler truncated to 8), thus causing some invisible name collisions. The compiler also got stack overflows trying to process even modestly nested POP-2 expressions and we had to refactor the code with many jumps and assignments to get it to compile. We use the name PLTP(R) for the PLTP code refactored to compile under the emulated Russian POP-2. But even after apparently successful compilation, PLTP(R) produced runtime errors or unexplained failures on some proof attempts of theorems in the regression.

We ultimately gave up on this attempt since it was clear that, even if we could get the emulated Russian POP-2 to run the regression successfully on some modified version of the code it would not demonstrate how the original source code ran. The ACL2 and OCaml implementations, PLTP(A) and PLTP(O), are better illustrations of the capabilities of PLTP than PLTP(R).

Annotated Bibliography