Guarantees provided by top-level evaluation
We often refer to the ``top-level loop,'' or just ``the loop'' when
the context is understood. The loop is the interactive read-eval-print loop
with which the user interacts to issue commands, query the ACL2 logical
world, test functions, conjectures, and systems, etc. This documentation
topic mainly addresses the logical guarantees of the loop. For practical
advice about interacting with and configuring the loop see ld.
Ld is the ACL2 implementation of the loop (as a macro that calls a
state-using function with many parameters which may be set by the
user). Ld is called initially when ACL2 is fired up.
But here we are concerned with what it means, logically, when a term
tm evaluates without error to a value v in the loop.
We do not formalize macroexpansion here, but rather, we deal with
so-called “translated” terms, where macros and constants
have been expanded away. We also do not discuss here how ACL2 evaluates
expressions; see evaluation for such discussion.
(Those interested in implementation issues may also read the long comment in
ACL2 source file interface-raw.lisp that is labeled “Essay on
Evaluation in ACL2”.)
See soundness for a more general discussion of soundness-related
issues (including, for example, interrupts and trust tags).
A Strawman Proposal on the Meaning of Top-Level Evaluation
One might like to think that for any term, tm, a non-erroneous
means that the formula
(equal tm 'v)
is a theorem and can be proved by the prover.
For example, let sq be a :logic mode function that squares its
argument. (For the sake of a later example, let us also make the guard of sq be t and verify the guard by fixing (with rfix) the argument to be a rational, but those aspects of sq are
irrelevant in our immediate use of it below.)
(defun sq (x)
(declare (xargs :guard t :verify-guards t))
(let ((x (rfix x)))
(* x x)))
then the evaluation
ACL2 !>(sq 3)
suggests that (equal (sq 3) 9) is a theorem and that is indeed
provable by the ACL2 theorem prover.
But the situation is quite subtle and deserves further discussion. That
is what this topic is about.
Spoiler Alert: We will not give logical semantics to every
non-erroneous evaluation in this discussion. We only hint at how it can be
done for many terms. Furthermore, even for those terms whose evaluations do
correspond to theorems, the logical theory in which those theorems can be
proved is not the theory supported by the ACL2 prover but an extension of it.
We ultimately describe below the so-called evaluation theory which is
an extension of ACL2's proof theory. So this documentation topic
discusses two logical theories. Because any theory in which proofs are done
might be called the ``proof theory'' and in this section we need to clearly
distinguish between the two theories being discussed, we'll refer to the
theory supported by the ACL2 theorem prover as the prover's
Problems Raised by :Program Mode Functions
Suppose prog-sq is defined as a :program mode function
that squares its argument. Then the top-level evaluation
ACL2 !>(prog-sq 3)
clearly should not suggest
(equal (prog-sq 3) 9)
is a theorem. :Program mode functions are not defined in the
prover's theory. (:Program mode functions may not terminate and any
naive attempt to add their definitions to the prover's theory as axioms could
result in unsoundness.) The top-level loop has the ability to evaluate
:program mode expressions but it is ``magic,'' unsupported by proof.
(Non-termination by the evaluator can't manifest unsoundness: such
computations just never produce an answer.) The evaluation of :program
mode expressions is supported in the loop primarily as a way to execute
commands like defun and defthm. But it is also a convenience
that allows ACL2 to be used as a programming language (e.g., to
prototype a system one might eventually admit to the logic and prove theorems
Therefore, when attempting to define the logical meaning of top-level
evaluations we might restrict our attention to :logic mode terms
only. But this is not sufficient for several reasons.
:Program mode functions raise another problem discussed below in
connection with apply$.
Problems Raised by Single-Threaded Objects
Evaluation of terms involving single-threaded objects or ``stobjs''
raise problems, even if the terms involved are in :logic mode. Here is
a top-level evaluation sequence
ACL2 !>(defstobj st fld)
ACL2 !>(update-fld 3 st)
ACL2 !>(update-fld (sq (fld st)) st)
ACL2 !>(fld st)
but the strawman conjecture from the last evaluation, (equal (fld st)
9), is not a theorem. However, a related formula is a theorem and can be
proved by the prover.
(implies (stp st)
(let* ((st (update-fld 3 st))
(st (update-fld (sq (fld st)) st)))
(equal (fld st) 3)))
This reminds us that the top-level loop tracks changes to each ``live''
stobj and records them in the ACL2 state. Those changes must be
reflected in any suggested theorems.
Problems Raised by Apply$
Apply$ calls apply$-userfn to handle the application of
user-defined function symbols. Apply$-userfn is undefined but might be
constrained by warrant hypotheses (see defwarrant). Recall
that warrants link the functions approved by defwarrant to their names
and constrain (apply$ 'fn (list a1 ... an)) to be (fn a1 ... an)
under certain tameness conditions. As noted at the bottom of the
discussion of defwarrant, there is a model of the prover's theory in
which all warrants issued by defwarrant are true. The top-level loop
assumes that all those warrants are true. But the prover's theory does not,
because explicit warrant hypotheses are essential to avoiding the so-called
``local problem'' (see Lesson 12 of introduction-to-apply$).
One might wonder why we have warrants at all, aside from the local
problem. The key reason is that inconsistency can result from
non-terminating recursion if we had the following
Unsound Axiom Schema:
(equal (apply$ 'f (list a1 ... an)) (f a1 ... an))
for every defined function f. See the russell example in
introduction-to-apply$. So warrants and the restrictions enforced by
defwarrant keep the prover's theory consistent. One might then ask
``how do we get away with letting the top-level loop apply$ all badged
:program mode functions?'' The answer is simple: :program mode
functions are not axiomatized so we don't have a theory to worry about!
To illustrate the difference between the behavior of the top-level loop
and the prover, suppose the function sq, defined above, has been
and then consider the top-level evaluation
ACL2 !>(apply$ 'sq '(3))
This is possible because the warrant for sq is assumed true in the
top-level loop and that warrant tells the loop that (apply$ 'sq '(3)) =
However, (equal (apply$ 'sq '(3)) '9) is not a theorem in the
prover's theory. Instead, this is a theorem
(implies (warrant sq)
(equal (apply$ 'sq '(3)) '9))
where (warrant sq) is just a convenient abbreviation for
Thus, to extend the strawman conjecture to functions in which apply$
is ancestral would require tracking the warrants relevant to the execution
path (or punting and collecting all warrants) and amending the conjecture to
add those warrants as hypotheses to the equality.
But apply$ introduces another source of complexity: mixed-mode
functions (see mixed-mode-functions). Recall the :program mode
function prog-sq which squares its argument. We can assign a badge to
prog-sq and then define a :logic mode function that apply$s
(defun logic-sq (x)
(declare (xargs :mode :logic))
(apply$ 'prog-sq (list x)))
Since the top-level loop can evaluate applications of :program mode
symbols we see the following evaluation.
ACL2 !>(logic-sq 3)
Note that (logic-sq 3) is a :logic mode term and logic-sq
can even be warranted with defwarrant. But neither
(equal (logic-sq 3) '9)
(implies (warrant logic-sq) (equal (logic-sq 3) '9))
is a theorem in the prover's theory because prog-sq is undefined in
that theory. For what it is worth, the prover can prove
(implies (warrant logic-sq) (equal (logic-sq 3) (apply$ 'prog-sq '(3))))
This example contradicts the suggestion, just above, that if we properly
considered warrants we could give logical meaning to evaluations of
:logic mode terms involving apply$.
Problems Raised by Constrained Functions
Constrained functions raise problems if they have been given
attachments (see defattach). The top-level loop uses attachments to
compute values consistent with the constraints, but the prover's theory does
not. Attachments are useful for building and testing instances of models
consistent with the constraints.
For example, let (nonneg-rat x) be constrained to return some
(encapsulate (((nonneg-rat *) => *))
(local (defun nonneg-rat (x)
(declare (ignore x))
(and (rationalp (nonneg-rat x))
(<= 0 (nonneg-rat x)))))
It is impossible to evaluate (nonneg-rat 3) in the top-level loop.
But we can attach another function to it provided we can prove that function
always returns a nonnegative rational. Since sq, as defined above to
rfix its argument as a rational, satisfies that constraint, we can attach
sq to nonneg-rat.
(defattach nonneg-rat sq)
and then we can ``evaluate'' calls of nonneg-rat in the loop, getting
results that are consistent with its constraint (but overly specific!).
ACL2 !>(nonneg-rat 3)
But of course the strawman conjecture (equal (nonneg-rat 3) '9) is
not a theorem of the prover's theory.
The Prover's Theory
The ACL2 logic formalizes an applicative (functional and side-effect free)
extension of a subset of Common Lisp (see common-lisp). It is
described in chapter 6 of Computer-Aided
Reasoning: An Approach, by Kaufmann, Manolios, and Moore, as an extension
of quantifier-free first-order logic with equality (section 6.1). The syntax
is that of translated terms though the user is allowed to use a more
flexible syntax which can be extended by defining constant symbols and
macros (see defconst and defmacro). To see the formal
translation of such a term use the command :trans. In section
6.2 of the book, axioms are added to characterize the basic data objects of
numbers (integers, rationals, and complex rationals), characters, strings,
symbols, and ordered pairs. In section 6.3, the ordinals up to epsilon-0 are
constructed from ordered pairs and integers; a well-founded ``less than''
relation is also defined. But the book's construction of the ordinals is now
obsolete. A new representation was implemented after the book was published.
See ordinals. In Section 6.4, a conservative Principle of
Definition (see defun and defuns or mutual-recursion)
is introduced, allowing the addition of axioms defining new function symbols.
In Section 6.5, a Principle of Induction up to epsilon-0 is described.
Induction and recursion are duals: every admissible recursive function
suggests an induction and vice versa and this duality is used by the ACL2
prover to select an often appropriate induction scheme for a conjecture. See
induction and hints for ways to influence this selection.
Finally, in section 6.6, the book introduces encapsulate to introduce
constrained functions and functional instantiation (a derived rule of
inference akin to second order instantiation), and a variety of other ways of
conservatively adding new function symbols.
The ACl2 prover proves theorems in the extension of the above theory
obtained by adding the axioms introduced by events successfully
carried out in the user's session. The most common such axiom-adding events
are defun, defchoose, and encapsulate, which all
conservatively extend the theory by the addition of axioms about new function
symbols. But defun is just a special case of defuns which adds a
mutually recursive clique of new names (see also mutual-recursion).
Defstobj and defabsstobj add recognizers, constructors, and
accessors for single-threaded objects (aka ``stobjs'') but logically
just use defun to add new functions and then syntactically restrict
their use in code. Finally, we provide a means of adding an arbitrary
formula as an axiom, defaxiom, but strongly discourage its
use. There are many other events that add axioms, e.g., defun-nx
and defun-sk but these are defined as macros that expand into the
primitives just listed. In addition, there is a facility for including files
of previously admitted events, include-book.
Note that an event of the form (skip-proofs EV) does not introduce
any axioms other that those added by the event, EV. However, this
skip-proofs event does assume that all proof obligations introduced
by EV are indeed provable in the prover's theory.
Given a user's session, we call the above described theory the prover's
Now recall the strawman proposal for what evaluation in the top-level loop
means, i.e., that the evaluation
might mean that the formula
(equal tm 'v)
is a theorem. We have illustrated with :program mode functions, uses
of apply$, and of constrained functions that this strawman conjecture is
not necessarily a theorem of the prover's theory!
However, there is a theory in which many of these conjectures are
The Evaluation Theory
The evaluation theory is obtained from the prover's theory as
Instead of treating stobj names as variables, the evaluation theory
treats them as abbreviations for the ``current value'' of the stobj,
specifically, the constant obtained by composing the sequence of all updates
to the stobj's fields carried out so far in the top-level loop. Thus, for
example, in the evaluation theories created by this sequence of top-level
ACL2 !>(defstobj st fld) ; 
ACL2 !>(update-fld 3 st) ; 
ACL2 !>(update-fld (* (fld st) (fld st)) st) ; 
st is an abbreviation for (NIL) after evaluation , an
abbreviation for (3) after evaluation , and an abbreviation for
(9) after evaluation .
Every warrant created by defwarrant is assumed true as an
For every (defattach f g) event in the prover's theory the
axiom (i.e., constraints) on f is replaced by the axiom (equal (f
...) (g ...)) in the evaluation theory. (Of course this replacement also
takes place when using the more general form for defattach,
(defattach ... (f g ...) ...).)
Given the restrictions enforced by defun, defstobj,
defwarrant, apply$, encapsulate, and defattach, the
evaluation theory is consistent if the prover's theory is consistent and free
of defaxiom events. (Those interested in reading a proof of this claim
are welcome to read the “Essay on Defattach” in ACL2 source file
Furthermore, the top-level evaluation
(equal tm 'v)
is a theorem in the evaluation theory provided
- tm is a :logic mode term, and
- every symbol that reaches the first argument of any apply$-userfn in
the execution of tm is in :logic mode.
The ACL2 prover cannot generally prove these theorems, since it operates
in the prover's theory. However, as noted, there are often ways to encode
the conjectures into ACL2 formulas that are provable — in particular,
by adding suitable warrant hypotheses — though the ACL2 system does not
provide tools for doing so.
Some Examples of Top-Level Evaluations
In this section we illustrate some consequences of the guarantee that for
any term, tm, satisfying certain restrictions, if tm evaluates
to a value v, then the term
(equal tm 'v)
is a theorem of the evaluation theory.
Each command below is enumerated and after the entire sequence we make
observations about the results and implications. It might also help to read
the section Top-Level Evaluation of Apply$ in the documentation for
apply$, which describes how apply$ is implemented in the
ACL2 !>(include-book "projects/apply/top" :dir :system)
ACL2 !>(defun my-cons (x y) ; 
(declare (xargs :mode :program))
(cons x y))
ACL2 !>(defbadge my-cons) ; 
MY-CONS now has the badge (APPLY$-BADGE 2 1 . T) but has
ACL2 !>(apply$ 'my-cons '(3 4)) ; 
(3 . 4)
ACL2 !>(thm (equal (my-cons 3 4) '(3 . 4))) ; 
******** FAILED ********
ACL2 !>(verify-termination my-cons) ; 
ACL2 !>(apply$ 'my-cons '(3 4)) ; 
ACL2 Error in TOP-LEVEL: The value of APPLY$-USERFN is not
specified on MY-CONS because MY-CONS has not been warranted.
ACL2 !>(defwarrant my-cons) ; 
MY-CONS is now warranted by APPLY$-WARRANT-MY-CONS, with
badge (APPLY$-BADGE 2 1 . T).
ACL2 !>(apply$ 'my-cons '(3 4)) ; 
(3 . 4)
ACL2 !>ACL2 !>(thm (equal (my-cons 3 4) '(3 . 4))) ; 
******** FAILED ********
ACL2 !>(thm ;
(implies (warrant my-cons)
(equal (my-cons 3 4) '(3 . 4))))
In events  and  my-cons is introduced as a badged :program
mode function. Command  shows that we can apply$ 'my-cons in
the top-level loop and get the ``expected'' value. This is ``magic'' on
several levels! There is no axiom in the evaluation theory defining
my-cons and there is no warrant connecting the symbol 'my-cons to
the function my-cons. The failure of the first thm command, ,
shows that the prover's theory knows nothing of the behavior of apply$
on 'my-cons. After converting my-cons to :logic mode with
verify-termination in  we can no longer evaluate the application
of my-cons at the top-level, as shown by ! This is surprising
because one might expect ACL2 to be able to do more with a :logic mode
function than its :program mode counterpart. But in the evolving
evaluation theory as of , my-cons is a :logic mode function with
no warrant and so logically speaking (apply$ 'my-cons ...) is
undefined. Computing a value for it at  would violate the guarantee that
values of :logic mode terms can be proved correct in the evaluation
theory — i.e., that the computed value can be derived logically. Such
a derivation is impossible without a warrant. However, after
defwarrant, in , confirms that it is sound to issue a
warrant (linking the symbol 'my-cons to the function my-cons via
apply$), the evaluation of the 'my-cons term in  succeeds. The
defwarrant soundly extends the proof theory by adding a definition for
the warrant function for my-cons and extends the evaluation theory by
assuming the warrant for my-cons is true. This latter extension is also
sound. However,  shows that we still cannot prove (in the prover's
theory) that the evaluation of the apply$ term is correct, because we
did not provide the warrant as a hypothesis. However, if we amend the
conjecture to include the warrant, as in , the proof (in the prover's
theory) succeeds. While all warrants are assumed true in the evaluation
theory, they must be made explicit as hypotheses for proofs in the prover's
theory. This is the mechanism whereby ACL2 can avoid the ``local problem''
illustrated in Lesson 12 of introduction-to-apply$ and
discussed more thoroughly in ``Limited
Second-Order Functionality in a First-Order Setting'' by Matt Kaufmann
and J Strother Moore.