Major Section: ACL2 Documentation

Bob Boyer and Warren Hunt have developed a canonical representation for ACL2 data objects and a function memoization mechanism to facilitate reuse of previously computed results. This facility includes procedures to read and print ACL2 expressions in such a way that repetition of some ACL2 objects is eliminated, thereby permitting a kind of on-the-fly file compression. The implementation does not alter the semantics of ACL2 except to add a handful of definitions.

Much of the documentation for this topic is taken from the paper ``Function Memoization and Unique Object Representation for ACL2 Functions'' by Robert S. Boyer and Warren A. Hunt, Jr., which has appeared in the Sixth International Workshop on the ACL2 Theorem Prover and Its Applications, ACM Digital Library, 2006.

You can build an experimental version of ACL2 that includes hash
cons, function memoization, and fast association lists (applicative
hash tables). An easy way is to include the following with a
`make`

command:

ACL2_HONS=hSo for example, to make an executable image and also documentation (which will appear in subdirectories

`doc/EMACS`

and
`doc/HTML`

):
make large DOC ACL2_HONS=h

### CLEAR-MEMOIZE-TABLE -- Forget values remembered for the function FN

### CLEAR-MEMOIZE-TABLES -- Forget values remembered for all the memoized functions.

### CONS-SUBTREES -- Computes the hons graph of a term

### HONS -- hash cons

### HONS-ACONS -- acons with hons

### HONS-ACONS! -- acons with hons

### HONS-GET -- assoc-equal for hons

### HONS-LET -- push a temporary, new environment for hash consing

`cons`

function. In Common Lisp, `cons`

is guaranteed to
provide a new pair, distinct from any previously created pair. We
have defined a new ACL2 function `HONS`

that is logically
identical to the ACL2 `cons`

function, but whose implementation
usually reuses an existing pair if its components are identical to
the components of an existing pair. A record of ACL2 HONS objects
is kept, and when an ACL2 function calls `hons`

ACL2 searches for
an existing identical pair before allocating a new pair; this
operation been called ``hash consing''.It appears that hash consing was first conceived by A. P. Ershov in 1957, to speed up the recognition of common subexpressions. Ershov showed how to collapse trees to minimal DAGs by traversing trees bottom up, and he used hashing to eliminate the re-evaluation of common subexpressions. Later, Eiichi Goto implemented a Lisp system with a built-in hash consing operation: his h-CONS cells were rewrite protected and free of duplicate copies, and Goto used this hash consing operation to facilitate the implementation of a symbolic algebra system he developed.

Memoizing functions also has a long history. In 1967, Donald Michie proposed using memoized functions to improve the performance of machine learning. Rote learning was improved by a learning function not forgetting what it had previously learned; this information was stored as memoized function values.

The use of hash consing has appeared many times. For instance, Henry Baker using hash consing to improve the performance of the well-known Boyer rewriting benchmark. Baker used both hash consing and function memoization to improve the speed of the Takeuchi function, exactly in the spirit of our implementation, but without the automated, system-wide integration we report here.

The ACL2 implementation permits memoization of user-defined functions. During execution a user may enable or disable function memoization on an individual function basis, may clear memoization tables, and may even keep a stack of memoization tables. This facility takes advantage of our implementation where we keep one copy of each distinct ACL2 data object. Due to the functional nature of ACL2, it is sufficient to have at most one copy of any data structure; thus, a user may arrange to keep data canonicalized. This implementation extends to the entire ACL2 system the benefits enjoyed by BDDs: canonicalization, memoization, and fast equality check.

We have defined various algorithms using these features, and we have observed, in some cases, substantial performance increases. For instance, we have implemented unordered set intersection and union operations that operate in time roughly linear in the size of the sets. Without using arrays, we defined a canonical representation for Boolean functions using ACL2 objects. We have investigated the performance of rewriting and tree consensus algorithms to good effect, and we believe function memoization offers interesting opportunities to simplify algorithm definition while simultaneously providing performance improvements.

We recommend that users focus at first on the logical definitions of
`hons`

and other primitives rather than their underlying Common
Lisp implementations. Integrated with this memoization system is a
performance monitoring system, which can provide real-time feedback
on the operation and usefulness of `hons`

and function
memoization. For a more detailed description of these tools, please
see the ACL2 2006 workshop paper mentioned above.

The Fibonacci function is a small example that demonstrates the utility of function memoization. The following definition exhibits a runtime that is exponential in its input argument.

(defun fib (x) (declare (xargs :guard (natp x))) (mbe :logic (cond ((zp x) 0) ((= x 1) 1) (t (+ (fib (- x 1)) (fib (- x 2))))) :exec (if (< x 2) x (+ (fib (- x 1)) (fib (- x 2))))))

Below we show how the ACL2 `time$`

utility can measure the time
to execute a call to the `fib`

function (with some editing to
avoid noise from the underlying Common Lisp implementation).
`Memoize`

is actually an ACL2 macro that expands to a call of
the ACL2 function used to identify a function for memoization;
see memoize. This function also accepts a well-formed term that
must be true in order for the system to memoize a call of the
memoized function; to ensure that an instance of the term is safe
for evaluation in Common Lisp, a check is performed that if the
guard of the memoized function is satisfied, then this instance
will execute without error.

ACL2 !>(time$ (fib 40))... took 2,641 milliseconds (2.641 seconds) to run with 8 available CPU cores. During that period, 2,644 milliseconds (2.644 seconds) were spent in user mode 0 milliseconds (0.000 seconds) were spent in system mode 16 bytes of memory allocated. 102334155 ACL2 !>(memoize 'fib)

Summary Form: ( TABLE MEMOIZE-TABLE ...) Rules: NIL Warnings: None Time: 0.04 seconds (prove: 0.00, print: 0.00, other: 0.04)

Summary Form: ( PROGN (TABLE MEMOIZE-TABLE ...) ...) Rules: NIL Warnings: None Time: 0.04 seconds (prove: 0.00, print: 0.00, other: 0.04) FIB ACL2 !>(time$ (fib 40))

... took 19 milliseconds (0.019 seconds) to run with 8 available CPU cores. During that period, 20 milliseconds (0.020 seconds) were spent in user mode 0 milliseconds (0.000 seconds) were spent in system mode 539,088 bytes of memory allocated. 193 minor page faults, 0 major page faults, 0 swaps. 102334155 ACL2 !>(time$ (fib 100))

... took 0 milliseconds (0.000 seconds) to run with 8 available CPU cores. During that period, 0 milliseconds (0.000 seconds) were spent in user mode 0 milliseconds (0.000 seconds) were spent in system mode 5,760 bytes of memory allocated. 3 minor page faults, 0 major page faults, 0 swaps. 354224848179261915075 ACL2 !>(unmemoize 'fib)

We see that once the function `fib`

is identified as a function
for which function calls should be memoized, the execution times are
substantially reduced. Finally, we can prevent `fib`

from being
further memoized; in fact, `unmemoize`

erases the previously
memoized values.

The implementation of hash consing, memoization, and applicative hash tables involves several facets: canonical representation of ACL2 data, function memoization, and the use of Lisp hash tables to improve the performance of association list operations. We discuss each of these in turn, and we mention some subtle interrelationships. Although it is not necessary to understand the discussion in this section, it may permit some users to better use this implementation. This discussion may be confusing for some ACL2 users as it makes references to Lisp implementation features.

The ACL2 system is actually implemented as a Lisp program that is layered on top of a Common Lisp system implementation. ACL2 data constants are implemented with their corresponding counterparts in Common Lisp; that is, ACL2 cons pairs, strings, characters, numbers, and symbols are implemented with their specific Common Lisp counterparts. This choice permits a number of ACL2 primitive functions to be implemented with their corresponding Common Lisp functions, but there are indeed differences. ACL2 is a logic, and as such, it does not specify anything to do with physical storage or execution performance. When ACL2 is used, the knowledgeable user can write functions to facilitate the reuse of some previously computed values.

Recall the three mechanisms under discussion: hash consing, function
memoization, and fast association list operations. The function
memoization mechanism takes advantage of the canonical
representation of data objects provided by the `hons`

operation
as does the fast association list operation mechanism. Each time
`hons`

is invoked, its arguments are themselves converted, if
necessary, to uniquely represented objects.

The ACL2 universe is recursively closed under the `cons`

pairing
operation and the atoms. Hash consing (`hons`

) is logically
identical to `cons`

, but a set of tables is used to record each
`hons`

pair. In fact, the implementation provides a stack of such
tables; thus a new environment of tables can be requested, used, and
then released; see hons-let. When a `hons`

pair is requested,
the implementation checks, in the current set of tables, whether the
requested pair already exists. If not, a new pair is created and a
record of that pair is made; otherwise, a reference to the
previously created pair is returned. Thus, any data object created
with `hons`

has a unique representation, as does every
subcomponent. We also arrange for strings to have a unique
representation -- only one copy of each different string is kept
-- and when any previously unseen string is an argument to
`hons`

, we add the string to a unique table of strings. A
system-wide benefit of having a canonical representation for data is
that equality comparisons between any two data objects can be done
in constant time.

The definition of `hons`

in no way changes the operation of
`cons`

-- `hons`

creates a `cons`

pair. When asked to
create a `hons`

, the implementation checks to see if there is a
`cons`

with the same `car`

and `cdr`

as the `hons`

being
requested. Thus, the only difference between the results of a
`hons`

call and a corresponding `cons`

call is a notation in
some invisible (to the ACL2 logic) tables where we record which
`cons`

elements are also `hons`

elements. Since a `hons`

is
nothing but a `cons`

, the operation of `car`

and `cdr`

is
unchanged. In fact, the implementation is designed so that at any
time it is safe to clear the table identifying which `cons`

elements are also considered `hons`

elements.

User-defined functions with defined and verified guards can be
memoized. When a function is memoized, a user-supplied condition
restricts the domain when memoization is attempted. When the
condition is satisfied, memoization is attempted (assuming that
memoization for the function is presently enabled); otherwise, the
function is just evaluated. Each memoized function has a hash table
that is used to keep track of a unique list of function arguments
paired with their values. If appropriate, for each function the
corresponding table is checked to see if a previous call with
exactly the same arguments already exists in the table: if so, then
the associated value is returned; if not, then the function is
evaluated and a new key-value pair is added to the table of memoized
values so that some future call will benefit from the memoization.
With ACL2 user functions memoization can be dynamically enabled and
disabled. There is an ACL2 function that clears a specific
memoization table. And finally, just as with the `hons`

table, a
stack of these function memoization tables is maintained; that is,
it is possible to memoize a function, use it a bit, set the memoized
values aside, start a new table, use it, and then return to the
original table.

We next discuss the fast lookup operation for association lists.
When a pair is added to an association list using the functions
`hons-acons`

or `hons-acons!`

, the system also records the
key-value pair in an associated hash table. As long as a user only
used these two functions when placing key-value pairs on an
association list, the key-value pairs in the corresponding hash
table will be synchronized with the key-value pairs in the
association list. Later, if `hons-get`

is used to look up a key,
then instead of performing a linear search of the association list
we consult the associated hash table. If a user does not strictly
follow this discipline, then a linear search may be required. In
some sense, these association lists are much like ACL2 arrays, but
without the burden of explicitly naming the arrays. The ACL2 array
`compress1`

function is analogous to the functions
`hons-shrink-alist`

and `hons-shrink-alist!`

. There are
user-level ACL2 functions that allow the associated hash tables to
be cleared and removed.

Finally, we would advise anyone who is using CCL in a research
environment to stay plugged into the ``trunk'' or ``bleeding edge''
of CCL development. This is very easy to do by typing a few
commands to a shell, for example standing above the target directory
as follows, provided one has `svn`

working.

For linux:

rm -rf ccl svn co http://svn.clozure.com/publicsvn/openmcl/trunk/linuxx8664/ccl

For an x86 Macintosh running the Darwin OS:

svn co http://svn.clozure.com/publicsvn/openmcl/trunk/darwinx8664/ccl

To keep up to date, you may find it sufficient to do:

cd ccl svn update

REFERENCES

Baker, Henry G., The Boyer Benchmark at Warp Speed. ACM Lisp Pointers V,3 (Jul-Sep 1992), pages 13-14.

Baker, Henry G., A Tachy 'TAK'. ACM Lisp Pointers Volume 3, July-September, 1992, pages 22-23.

Robert S. Boyer and Warren A. Hunt, Jr., Function Memoization and Unique Object Representation for ACL2 Functions, in the Sixth International Workshop on the ACL2 Theorem Prover and Its Applications, ACM Digital Library, 2006.

A. P. Ershov. On Programming of Arithmetic Operations. In the Communications of the ACM, Volume 118, Number 3, August, 1958, pages 427-430.

Eiichi Goto, Monocopy and Associative Algorithms in Extended Lisp, TR-74-03, University of Toyko, 1974.

Richard P. Gabriel. Performance and Evaluation of Lisp Systems. MIT Press, 1985.

Donald Michie. Memo functions: a Language Feature with Rote Learning Properties. Technical Report MIP-R-29, Department of Artificial Intelligence, University of Edinburgh, Scotland, 1967.

Donald Michie. Memo Functions and Machine Learning. In Nature,
Volumne 218, 1968, pages 19-22.