==============================================
ACL2 challenge problem: Formalizing BitCryptol
John Matthews
April 26, 2005
==============================================
Cryptol (http://www.cryptol.net) is a domain-specific declarative
language for specifying crypto algorithms. A notable feature of
Cryptol is that Cryptol programs can define mutually recursive streams
(infinite sequences) as values.
Here at Galois Connections, Inc. (http://www.galois.com), we have run
into difficulties in giving a deep embedding of Cryptol in
ACL2, since ACL2's logic does not permit functions nor infinite
sequences as first-class values. Below, we describe BitCryptol, a tiny
language derived from Cryptol. BitCryptol contains just enough
language features to illustrate the difficulty of giving a deep
embedding of Cryptol in ACL2. The PDF slides accompanying this text
note provide some additional details.
Types
-----
A BitCryptol value can be of one of the following types.
Bit ; A Boolean value, either nil or t.
Nat ; A stream index, which is represented as a natural number
Stream of Bits ; An infinite sequence of bits, represented as a
; function from naturals to bits
Expressions
-----------
A BitCrypol expression can take on one of the following forms.
nil ; Constant bit false
t ; Constant bit true
; Stream index- or stream-valued variable
`(Num ,n) ; Constant stream index
`(Xor ,b1 ,b2) ; Exclusive-or of b1 and b2
`(Neg ,b) ; Logical negation of b
`(SLam ,v ,b) ; Stream comprehension (lambda-abstraction).
; v is a stream-index variable, and b is a bit
; expression that can mention v. Returns a stream.
`(At ,s ,n) ; Returns the n'th element of stream s
`(SCons ,b ,s) ; Appends bit b to the front of stream s
`(STail ,s) ; Removes the first element from s
`(SWhere ,s ; Evaluates s in the context of local stream definitions
((,v1 . ,s1) ; v1, v2, .... The bodies s1, s2, ... of the local stream
(,v2 . ,s2) ; can be mutually recursive, that is, they can refer to
...)) ; other streams defined within the same or enclosing
; SWhere expressions.
Semantics and examples
----------------------
The semantics function for BitCryptol is called SEM. It takes an
expression and an environment (i.e. alist) of values, and returns the
value of the expression in that environment. Note that SEM is higher
order, in that streams are represented as
lambda-abstractions. Therefore the semantics as I give it here can not
be directly expressed in ACL2's logic, as of version 2.9.
Here are some equations the semantics function should satisfy for
well-formed expressions. These equations are not complete, in that
they assume that the BitCryptol expressions are type-correct and that
all stream definitions are well-founded. However, I have given a
complete semantics for BitCryptol in Isabelle/HOL, just to make sure
the language can be given a set-theoretic semantics.
Here are the equations.
;Bit constant zero.
(sem nil env) = nil
;Bit constant one.
(sem t env) = t
;Variable, represented as a string.
ASSUMING: (stringp v)
(sem v env)
= (cdr (assoc-equal v env))
;Exclusive-or of two bits
(sem `(Xor ,b1 ,b2) env)
= (not (equal (sem b1 env)
(sem b2 env)))
;Stream-index constant, represented as a natural number.
ASSUMING: (natp n)
(sem n env) = n
;Anonymous stream lambda-function.
;Takes a stream index variable name, and
; a bit-valued expression as arguments. Returns
; a stream, represented as a function from stream
; indexes (natural numbers) to bits (Booleans).
(sem `(SLam ,v ,b) env)
= (lambda (n)
(let ((newenv `((,v . ,n) ,@env)))
(sem b newenv)))
;Given a stream and a stream-index, returns
; the stream element at that index.
(sem `(At ,s ,n) env)
= (apply (sem s env)
(sem n env))
;Appends a bit to the front of a stream.
(sem `(SCons ,b ,s) env)
= (lambda (n)
(if (zp n)
(sem b env)
(apply (sem s env)
(- n 1))))
;Takes the tail of a stream.
(sem `(STail ,s) env)
= (lambda (n)
(apply (sem s env)
(+ n 1)))
;A stream-where clause. Evaluates stream
; expression s in the context of local stream
; definitions v1, v2, ....
; The local stream definitions can be mutually
; recursive.
(sem `(SWhere ,s
((,v1 . ,s1)
(,v2 . ,s2)
...))
env)
= (LETREC ((recenv `((,v1 . ,f1)
(,v2 . ,f2)
...))
(f1 (lambda (n) (apply (sem s1 recenv) n)))
(f2 (lambda (n) (apply (sem s2 recenv) n)))
...)
(sem s recenv))
NOTE: for BitCryptol, if a stream is not eventually well founded at
index n, then the value at that index is nil. The formal semantics is
defined in Isabelle by a "well-founded stream fixpoint" functional,
called fix-senv. The functional takes an environment of approximations
to the final stream values, and returns a "better" approximation. Using
fix-senv, the real equation for SWhere expressions is
(sem `(SWhere ,s
((,v1 . ,s1)
(,v2 . ,s2)
...))
env)
= (let ((recenv (fix-senv
(lambda (renv)
`((,v1 . ,(sem s1 renv))
(,v2 . ,(sem s2 renv))
...)))))
(sem s (append recenv env)))
The PDF slides contain the definition of a simplified version of
fix-senv, called wf-fix-stream. The wf-fix-stream function defines a
single recursive stream only, rather than a family of mutually
recursive streams.
Example
-------
Here is an example program. It computes the least significant bit of
the N'th element of the Fibonacci sequence. This is not the simplest
definition, but is instead designed to use most of the syntax of
BitCryptol.
First, here is the concrete syntax for the example.
fib @@ n swhere { rec fib = nil ## fib-tail
and fib-tail = t ## (fcn n. (stail fib @@ n) xor (fib @@ n) }
Here is the corresponding abstract ACL2 syntax for the program,
assuming (natp N).
`(SWhere (At "fib" ,N)
(("fib" . (SCons nil "fib-tail"))
("fib-tail" . (SCons t (SLam "n" (Xor (At (SDrop 1 "fib") "n")
(At "fib" "n")))))))
Compositional operational semantics
-----------------------------------
Since ACL2 does not currently support any kind of function object as
a first-class value, we have tried for quite a while to find an operational
semantics that is still compositional and can be admitted into ACL2. Our
basic approach has been to add an "observation" parameter to SEM, that
indicates what finite observation we are interested in when SEM is evaluating
an infinitary object, such as a stream.
Then, streams and other infinitary objests (such as functions) are
stored syntactically in the environment (along with their lexical
closure environments), rather than as function values.
The chief question then becomes: What measure function do we use to ensure termination? We have tried several, of increasing complexity, without success:
* Expression measure: The first measure would be simply to use
ACL2-COUNT on the expression argument being evaluated by SEM. However,
this measure increases when we lookup the value of a stream definition
in the environment, since the size of the expression defining the
stream is larger than the variable name.
* Expression+environment measure: Our next idea was to store "high water
marks" on the stream definitions in the environment. A high water mark
represents the maximum index that we will evaluate a stream at.
During stream variable lookup, SEM will check to make sure the current
observation index is less than the high water mark. If not, then SEM
returns nil. Otherwise, it evaluates the expression, after setting the
high water mark for the stream to the current observation index.
The measure function for any given environment entry would then be
based on a lexicographic ordering of a high water mark and the size of
the expression. In the case of the expression currently being
evaluated by SEM, the measure would correspond to a lexicographic
ordering of the current observation and the size of the expression
being evaluated. The overall measure would be the sum of the
environment entry and current expression measures.
However, This measure can't be used for the following reasons:
- Evaluating STail expressions causes this measure to increase,
since the observation parameter must be incremented when evaluating
the argument to STail.
- Evaluating SWhere expressions causes the measure to increase, since
new stream definitions whose high water marks are all (omega), are
added to the current environment.
* $\omega$-polynomial measure: We finally tried to devise a
recursively-defined measure function on the expression being
evaluated, the current observation index, and the environment of
definitions, that would decrease when evaluating SWhere
expressions. The measure function was defined in terms of {\em
$\omega$-polynomial} arithmetic operators, which unlike ACL2's current
ordinal operators are strictly monotonic. We hoped that using
$\omega$-polynomials would allow us to naturally characterize the
depth of nested stream definitions, which would decrease when adding a
SWHere expression's local definitions to the environment.
The associated PDF slide contains some of the details, but this
measure too failed. The problem is that we couldn't determine what
the measure of a recursively defined stream definition should be,
since the definition contained free stream variables whose measure had
not yet been calculated.
Current solution
----------------
At this point we have fallen back to performing a shallow embedding of
each Cryptol program, where each stream is defined as an ACL2 function.
This requires us to performing lambda-lifting, since ACL2 function definitions
can't be nested, and to convert stream comprehensions to expressions that
manipulate stream indexes explicitly.
The advantage of this approach is that it is feasible; the
disadvantage is that now evaluators must inspect the ACL2 translation
of each every Cryptol program being certified, rather than
once-and-for-all inspecting a semantics function for the Cryptol
language as a whole.
A possible long-term solution: higher order functions in ACL2
-------------------------------------------------------------
Since it is relatively straightfoward to model BitCryptol in higher
order logic, I am advocating that ACL2 add some form of first-class
functions as new ACL2 objects. However, I am *not* suggesting that we
turn ACL2 into a higher order logic! Rather, we could follow the same
approach that set theory takes, and continue to have a first order
logic with functions as a special kind of data value.
We would still have to be careful to avoid the logical paradoxes that
can occur when one is allowed to apply a function to itself as an
argument. So I'm suggesting that objects in ACL2 have a "rank"
associated with them. The normal atoms and cons cells of ACL2 would
have rank zero. A function that can take rank zero objects to rank
zero objects would have rank one. A function that can take rank zero
and rank one objects to rank zero or rank one objects would have rank
two, and so on. A function object would be defined to return nil
whenever it is applied to a value whose rank is too high. In particular,
this would guarantee that applying a function object to itself results
in nil.
With this model, function objects can easily be modeled as ZFC sets of
pairs, which should rule out any logical inconsistancy. Matt Kaufmann
has pointed out that we also need to make sure that the axioms
comprising function objects form a conservative extension to ACL2, so
that they behave nicely with respect to local events and functional
instantiation.
The PDF slides briefly sketch out one possible formulation of function
objects, but others are certainly possible. In fact, Matt Kaufmann has
recently started working on a promising alternative proposal for
adding function objects to ACL2.