ARRAYS

an introduction to ACL2 arrays.
```Major Section:  MISCELLANEOUS
```

Below we begin a detailed presentation of ACL2 arrays. ACL2's single-threaded objects (see stobj) provide a similar functionality that is generally more efficient but also more restrictive. Related topics:

• SLOW-ARRAY-WARNING -- a warning issued when arrays are used inefficiently

See arrays-example for a brief introduction illustrating the use of ACL2 arrays.

ACL2 provides relatively efficient 1- and 2-dimensional arrays. Arrays are awkward to provide efficiently in an applicative language because the programmer rightly expects to be able to ``modify'' an array object with the effect of changing the behavior of the element accessing function on that object. This, of course, does not make any sense in an applicative setting. The element accessing function is, after all, a function, and its behavior on a given object is immutable. To ``modify'' an array object in an applicative setting we must actually produce a new array object. Arranging for this to be done efficiently is a challenge to the implementors of the language. In addition, the programmer accustomed to the von Neumann view of arrays must learn how to use immutable applicative arrays efficiently.

In this note we explain 1-dimensional arrays. In particular, we explain briefly how to create, access, and ``modify'' them, how they are implemented, and how to program with them. 2-dimensional arrays are dealt with by analogy.

The Logical Description of ACL2 Arrays

An ACL2 1-dimensional array is an object that associates arbitrary objects with certain integers, called ``indices.'' Every array has a dimension, `dim`, which is a positive integer. The indices of an array are the consecutive integers from `0` through `dim-1`. To obtain the object associated with the index `i` in an array `a`, one uses `(aref1 name a i)`. `Name` is a symbol that is irrelevant to the semantics of `aref1` but affects the speed with which it computes. We will talk more about array ``names'' later. To produce a new array object that is like `a` but which associates `val` with index `i`, one uses `(aset1 name a i val)`.

An ACL2 1-dimensional array is actually an alist. There is no special ACL2 function for creating arrays; they are generally built with the standard list processing functions `list` and `cons`. However, there is a special ACL2 function, called `compress1`, for speeding up access to the elements of such an alist. We discuss `compress1` later.

One element of the alist must be the ``header'' of the array. The header of a 1-dimensional array with dimension `dim` is of the form:

```(:HEADER :DIMENSIONS (dim)
:MAXIMUM-LENGTH max
:DEFAULT obj
:NAME name).
```
`Obj` may be any object and is called the ``default value'' of the array. `Max` must be an integer greater than `dim`. `Name` must be a symbol. The `:``default` and `:name` entries are optional; if `:``default` is omitted, the default value is `nil`. The function `header`, when given a name and a 1- or 2-dimensional array, returns the header of the array. The functions `dimensions`, `maximum-length`, and `default` are similar and return the corresponding fields of the header of the array. The role of the `:``dimensions` field is obvious: it specifies the legal indices into the array. The roles played by the `:``maximum-length` and `:``default` fields are described below.

Aside from the header, the other elements of the alist must each be of the form `(i . val)`, where `i` is an integer and `0 <= i < dim`, and `val` is an arbitrary object.

`(Aref1 name a i)` is guarded so that `name` must be a symbol, `a` must be an array and `i` must be an index into `a`. The value of `(aref1 name a i)` is either `(cdr (assoc i a))` or else is the default value of `a`, depending on whether there is a pair in `a` whose `car` is `i`. Note that `name` is irrelevant to the value of an `aref1` expression. You might `:pe aref1` to see how simple the definition is.

`(Aset1 name a i val)` is guarded analogously to the `aref1` expression. The value of the `aset1` expression is essentially `(cons (cons i val) a)`. Again, `name` is irrelevant. Note `(aset1 name a i val)` is an array, `a'`, with the property that `(aref1 name a' i)` is `val` and, except for index `i`, all other indices into `a'` produce the same value as in `a`. Note also that if `a` is viewed as an alist (which it is) the pair ``binding'' `i` to its old value is in `a'` but ``covered up'' by the new pair. Thus, the length of an array grows by one when `aset1` is done.

Because `aset1` covers old values with new ones, an array produced by a sequence of `aset1` calls may have many irrelevant pairs in it. The function `compress1` removes these irrelevant pairs. Thus, `(compress1 name a)` returns an array that is equivalent (vis-a-vis `aref1`) to `a` but which may be shorter. For technical reasons, the alist returned by `compress1` may also list the pairs in a different order than listed in `a`.

To prevent arrays from growing excessively long due to repeated `aset1` operations, `aset1` actually calls `compress1` on the new alist whenever the length of the new alist exceeds the `:``maximum-length` entry, `max`, in the header of the array. See the definition of `aset1` (for example by using `:``pe`). This is primarily just a mechanism for freeing up `cons` space consumed while doing `aset1` operations.

This completes the logical description of 1-dimensional arrays. 2-dimensional arrays are analogous. The `:``dimensions` entry of the header of a 2-dimensional array should be `(dim1 dim2)`. A pair of indices, `i` and `j`, is legal iff `0 <= i < dim1` and `0 <= j < dim2`. The `:``maximum-length` must be greater than `dim1*dim2`. `Aref2`, `aset2`, and `compress2` are like their counterparts but take an additional `index` argument. Finally, the pairs in a 2-dimensional array are of the form `((i . j) . val)`.

The Implementation of ACL2 Arrays

Very informally speaking, the function `compress1` ``creates'' an ACL2 array that provides fast access, while the function `aref1` ``maintains'' fast access. We now describe this informal idea more carefully.

`Aref1` is essentially `assoc`. If `aref1` were implemented naively the time taken to access an array element would be linear in the dimension of the array and the number of ``assignments'' to it (the number of `aset1` calls done to create the array from the initial alist). This is intolerable; arrays are ``supposed'' to provide constant-time access and change.

The apparently irrelevant names associated with ACL2 arrays allow us to provide constant-time access and change when arrays are used in ``conventional'' ways. The implementation of arrays makes it clear what we mean by ``conventional.''

Recall that array names are symbols. Behind the scenes, ACL2 associates two objects with each ACL2 array name. The first object is called the ``semantic value'' of the name and is an alist. The second object is called the ``raw lisp array'' and is a Common Lisp array.

When `(compress1 name alist)` builds a new alist, `a'`, it sets the semantic value of `name` to that new alist. Furthermore, it creates a Common Lisp array and writes into it all of the index/value pairs of `a'`, initializing unassigned indices with the default value. This array becomes the raw lisp array of `name`. `Compress1` then returns `a'`, the semantic value, as its result, as required by the definition of `compress1`.

When `(aref1 name a i)` is invoked, `aref1` first determines whether the semantic value of `name` is `a` (i.e., is `eq` to the alist `a`). If so, `aref1` can determine the `i`th element of `a` by invoking Common Lisp's `aref` function on the raw lisp array associated with name. Note that no linear search of the alist `a` is required; the operation is done in constant time and involves retrieval of two global variables, an `eq` test and `jump`, and a raw lisp array access. In fact, an ACL2 array access of this sort is about 5 times slower than a C array access. On the other hand, if `name` has no semantic value or if it is different from `a`, then `aref1` determines the answer by linear search of `a` as suggested by the `assoc-like` definition of `aref1`. Thus, `aref1` always returns the axiomatically specified result. It returns in constant time if the array being accessed is the current semantic value of the name used. The ramifications of this are discussed after we deal with `aset1`.

When `(aset1 name a i val)` is invoked, `aset1` does two `cons`es to create the new array. Call that array `a'`. It will be returned as the answer. (In this discussion we ignore the case in which `aset1` does a `compress1`.) However, before returning, `aset1` determines if `name`'s semantic value is `a`. If so, it makes the new semantic value of `name` be `a'` and it smashes the raw lisp array of `name` with `val` at index `i`, before returning `a'` as the result. Thus, after doing an `aset1` and obtaining a new semantic value `a'`, all `aref1`s on that new array will be fast. Any `aref1`s on the old semantic value, `a`, will be slow.

To understand the performance implications of this design, consider the chronological sequence in which ACL2 (Common Lisp) evaluates expressions: basically inner-most first, left-to-right, call-by-value. An array use, such as `(aref1 name a i)`, is ``fast'' (constant-time) if the alist supplied, `a`, is the value returned by the most recently executed `compress1` or `aset1` on the name supplied. In the functional expression of ``conventional'' array processing, all uses of an array are fast.

The `:name` field of the header of an array is completely irrelevant. Our convention is to store in that field the symbol we mean to use as the name of the raw lisp array. But no ACL2 function inspects `:name` and its primary value is that it allows the user, by inspecting the semantic value of the array -- the alist -- to recall the name of the raw array that probably holds that value. We say ``probably'' since there is no enforcement that the alist was compressed under the name in the header or that all `aset`s used that name. Such enforcement would be inefficient.

Some Programming Examples

In the following examples we will use ACL2 ``global variables'' to hold several arrays. See @, and see assign.

Let the `state` global variable `a` be the 1-dimensional compressed array of dimension `5` constructed below.

```ACL2 !>(assign a (compress1 'demo
:maximum-length 15
:default uninitialized
:name demo)
(0 . zero))))
```
Then `(aref1 'demo (@ a) 0)` is `zero` and `(aref1 'demo (@ a) 1)` is `uninitialized`.

Now execute

```ACL2 !>(assign b (aset1 'demo (@ a) 1 'one))
```
Then `(aref1 'demo (@ b) 0)` is `zero` and `(aref1 'demo (@ b) 1)` is `one`.

All of the `aref1`s done so far have been ``fast.''

Note that we now have two array objects, one in the global variable `a` and one in the global variable `b`. `B` was obtained by assigning to `a`. That assignment does not affect the alist `a` because this is an applicative language. Thus, `(aref1 'demo (@ a) 1)` must still be `uninitialized`. And if you execute that expression in ACL2 you will see that indeed it is. However, a rather ugly comment is printed, namely that this array access is ``slow.'' The reason it is slow is that the raw lisp array associated with the name `demo` is the array we are calling `b`. To access the elements of `a`, `aref1` must now do a linear search. Any reference to `a` as an array is now ``unconventional;'' in a conventional language like Ada or Common Lisp it would simply be impossible to refer to the value of the array before the assignment that produced our `b`.

Now let us define a function that counts how many times a given object, `x`, occurs in an array. For simplicity, we will pass in the name and highest index of the array:

```ACL2 !>(defun cnt (name a i x)
(declare (xargs :guard
(and (array1p name a)
(integerp i)
(>= i -1)
(< i (car (dimensions name a))))
:mode :logic
:measure (nfix (+ 1 i))))
(cond ((zp (1+ i)) 0) ; return 0 if i is at most -1
((equal x (aref1 name a i))
(1+ (cnt name a (1- i) x)))
(t (cnt name a (1- i) x))))
```
To determine how many times `zero` appears in `(@ b)` we can execute:
```ACL2 !>(cnt 'demo (@ b) 4 'zero)
```
The answer is `1`. How many times does `uninitialized` appear in `(@ b)`?
```ACL2 !>(cnt 'demo (@ b) 4 'uninitialized)
```
The answer is `3`, because positions `2`, `3` and `4` of the array contain that default value.

Now imagine that we want to assign `'two` to index `2` and then count how many times the 2nd element of the array occurs in the array. This specification is actually ambiguous. In assigning to `b` we produce a new array, which we might call `c`. Do we mean to count the occurrences in `c` of the 2nd element of `b` or the 2nd element of `c`? That is, do we count the occurrences of `uninitialized` or the occurrences of `two`? If we mean the former the correct answer is `2` (positions `3` and `4` are `uninitialized` in `c`); if we mean the latter, the correct answer is `1` (there is only one occurrence of `two` in `c`).

Below are ACL2 renderings of the two meanings, which we call `[former]` and `[latter]`. (Warning: Our description of these examples, and of an example `[fast former]` that follows, assumes that only one of these three examples is actually executed; for example, they are not executed in sequence. See ``A Word of Warning'' below for more about this issue.)

```(cnt 'demo (aset1 'demo (@ b) 2 'two) 4 (aref1 'demo (@ b) 2))  ; [former]

(let ((c (aset1 'demo (@ b) 2 'two)))                           ; [latter]
(cnt 'demo c 4 (aref1 'demo c 2)))
```
Note that in `[former]` we create `c` in the second argument of the call to `cnt` (although we do not give it a name) and then refer to `b` in the fourth argument. This is unconventional because the second reference to `b` in `[former]` is no longer the semantic value of `demo`. While ACL2 computes the correct answer, namely `2`, the execution of the `aref1` expression in `[former]` is done slowly.

A conventional rendering with the same meaning is

```(let ((x (aref1 'demo (@ b) 2)))                           ; [fast former]
(cnt 'demo (aset1 'demo (@ b) 2 'two) 4 x))
```
which fetches the 2nd element of `b` before creating `c` by assignment. It is important to understand that `[former]` and `[fast former]` mean exactly the same thing: both count the number of occurrences of `uninitialized` in `c`. Both are legal ACL2 and both compute the same answer, `2`. Indeed, we can symbolically transform `[fast former]` into `[former]` merely by substituting the binding of `x` for `x` in the body of the `let`. But `[fast former]` can be evaluated faster than `[former]` because all of the references to `demo` use the then-current semantic value of `demo`, which is `b` in the first line and `c` throughout the execution of the `cnt` in the second line. `[Fast former]` is the preferred form, both because of its execution speed and its clarity. If you were writing in a conventional language you would have to write something like `[fast former]` because there is no way to refer to the 2nd element of the old value of `b` after smashing `b` unless it had been saved first.

We turn now to `[latter]`. It is both clear and efficient. It creates `c` by assignment to `b` and then it fetches the 2nd element of `c`, `two`, and proceeds to count the number of occurrences in `c`. The answer is `1`. `[Latter]` is a good example of typical ACL2 array manipulation: after the assignment to `b` that creates `c`, `c` is used throughout.

It takes a while to get used to this because most of us have grown accustomed to the peculiar semantics of arrays in conventional languages. For example, in raw lisp we might have written something like the following, treating `b` as a ``global variable'':

```(cnt 'demo (aset 'demo b 2 'two) 4 (aref 'demo b 2))
```
which sort of resembles `[former]` but actually has the semantics of `[latter]` because the `b` from which `aref` fetches the 2nd element is not the same `b` used in the `aset`! The array `b` is destroyed by the `aset` and `b` henceforth refers to the array produced by the `aset`, as written more clearly in `[latter]`.

A Word of Warning: Users must exercise care when experimenting with `[former]`, `[latter]` and `[fast former]`. Suppose you have just created `b` with the assignment shown above,

```ACL2 !>(assign b (aset1 'demo (@ a) 1 'one))
```
If you then evaluate `[former]` in ACL2 it will complain that the `aref1` is slow and compute the answer, as discussed. Then suppose you evaluate `[latter]` in ACL2. From our discussion you might expect it to execute fast -- i.e., issue no complaint. But in fact you will find that it complains repeatedly. The problem is that the evaluation of `[former]` changed the semantic value of `demo` so that it is no longer `b`. To try the experiment correctly you must make `b` be the semantic value of `demo` again before the next example is evaluated. One way to do that is to execute
```ACL2 !>(assign b (compress1 'demo (@ b)))
```
before each expression. Because of issues like this it is often hard to experiment with ACL2 arrays at the top-level. We find it easier to write functions that use arrays correctly and efficiently than to so use them interactively.

This last assignment also illustrates a very common use of `compress1`. While it was introduced as a means of removing irrelevant pairs from an array built up by repeated assignments, it is actually most useful as a way of insuring fast access to the elements of an array.

Many array processing tasks can be divided into two parts. During the first part the array is built. During the second part the array is used extensively but not modified. If your programming task can be so divided, it might be appropriate to construct the array entirely with list processing, thereby saving the cost of maintaining the semantic value of the name while few references are being made. Once the alist has stabilized, it might be worthwhile to treat it as an array by calling `compress1`, thereby gaining constant time access to it.

ACL2's theorem prover uses this technique in connection with its implementation of the notion of whether a rune is disabled or not. Associated with every rune is a unique integer `index`, called its ``nume.'' When each rule is stored, the corresponding nume is stored as a component of the rule. Theories are lists of runes and membership in the ``current theory'' indicates that the corresponding rule is enabled. But these lists are very long and membership is a linear-time operation. So just before a proof begins we map the list of runes in the current theory into an alist that pairs the corresponding numes with `t`. Then we compress this alist into an array. Thus, given a rule we can obtain its nume (because it is a component) and then determine in constant time whether it is enabled. The array is never modified during the proof, i.e., `aset1` is never used in this example. From the logical perspective this code looks quite odd: we have replaced a linear-time membership test with an apparently linear-time `assoc` after going to the trouble of mapping from a list of runes to an alist of numes. But because the alist of numes is an array, the ``apparently linear-time `assoc`'' is more apparent than real; the operation is constant-time.  