============================================== ACL2 challenge problem: Formalizing BitCryptol John Matthews April 26, 2005 ============================================== Cryptol (http://www.cryptol.net) is a domain-specific declarative language for specifying crypto algorithms. A notable feature of Cryptol is that Cryptol programs can define mutually recursive streams (infinite sequences) as values. Here at Galois Connections, Inc. (http://www.galois.com), we have run into difficulties in giving a deep embedding of Cryptol in ACL2, since ACL2's logic does not permit functions nor infinite sequences as first-class values. Below, we describe BitCryptol, a tiny language derived from Cryptol. BitCryptol contains just enough language features to illustrate the difficulty of giving a deep embedding of Cryptol in ACL2. The PDF slides accompanying this text note provide some additional details. Types ----- A BitCryptol value can be of one of the following types. Bit ; A Boolean value, either nil or t. Nat ; A stream index, which is represented as a natural number Stream of Bits ; An infinite sequence of bits, represented as a ; function from naturals to bits Expressions ----------- A BitCrypol expression can take on one of the following forms. nil ; Constant bit false t ; Constant bit true ; Stream index- or stream-valued variable `(Num ,n) ; Constant stream index `(Xor ,b1 ,b2) ; Exclusive-or of b1 and b2 `(Neg ,b) ; Logical negation of b `(SLam ,v ,b) ; Stream comprehension (lambda-abstraction). ; v is a stream-index variable, and b is a bit ; expression that can mention v. Returns a stream. `(At ,s ,n) ; Returns the n'th element of stream s `(SCons ,b ,s) ; Appends bit b to the front of stream s `(STail ,s) ; Removes the first element from s `(SWhere ,s ; Evaluates s in the context of local stream definitions ((,v1 . ,s1) ; v1, v2, .... The bodies s1, s2, ... of the local stream (,v2 . ,s2) ; can be mutually recursive, that is, they can refer to ...)) ; other streams defined within the same or enclosing ; SWhere expressions. Semantics and examples ---------------------- The semantics function for BitCryptol is called SEM. It takes an expression and an environment (i.e. alist) of values, and returns the value of the expression in that environment. Note that SEM is higher order, in that streams are represented as lambda-abstractions. Therefore the semantics as I give it here can not be directly expressed in ACL2's logic, as of version 2.9. Here are some equations the semantics function should satisfy for well-formed expressions. These equations are not complete, in that they assume that the BitCryptol expressions are type-correct and that all stream definitions are well-founded. However, I have given a complete semantics for BitCryptol in Isabelle/HOL, just to make sure the language can be given a set-theoretic semantics. Here are the equations. ;Bit constant zero. (sem nil env) = nil ;Bit constant one. (sem t env) = t ;Variable, represented as a string. ASSUMING: (stringp v) (sem v env) = (cdr (assoc-equal v env)) ;Exclusive-or of two bits (sem `(Xor ,b1 ,b2) env) = (not (equal (sem b1 env) (sem b2 env))) ;Stream-index constant, represented as a natural number. ASSUMING: (natp n) (sem n env) = n ;Anonymous stream lambda-function. ;Takes a stream index variable name, and ; a bit-valued expression as arguments. Returns ; a stream, represented as a function from stream ; indexes (natural numbers) to bits (Booleans). (sem `(SLam ,v ,b) env) = (lambda (n) (let ((newenv `((,v . ,n) ,@env))) (sem b newenv))) ;Given a stream and a stream-index, returns ; the stream element at that index. (sem `(At ,s ,n) env) = (apply (sem s env) (sem n env)) ;Appends a bit to the front of a stream. (sem `(SCons ,b ,s) env) = (lambda (n) (if (zp n) (sem b env) (apply (sem s env) (- n 1)))) ;Takes the tail of a stream. (sem `(STail ,s) env) = (lambda (n) (apply (sem s env) (+ n 1))) ;A stream-where clause. Evaluates stream ; expression s in the context of local stream ; definitions v1, v2, .... ; The local stream definitions can be mutually ; recursive. (sem `(SWhere ,s ((,v1 . ,s1) (,v2 . ,s2) ...)) env) = (LETREC ((recenv `((,v1 . ,f1) (,v2 . ,f2) ...)) (f1 (lambda (n) (apply (sem s1 recenv) n))) (f2 (lambda (n) (apply (sem s2 recenv) n))) ...) (sem s recenv)) NOTE: for BitCryptol, if a stream is not eventually well founded at index n, then the value at that index is nil. The formal semantics is defined in Isabelle by a "well-founded stream fixpoint" functional, called fix-senv. The functional takes an environment of approximations to the final stream values, and returns a "better" approximation. Using fix-senv, the real equation for SWhere expressions is (sem `(SWhere ,s ((,v1 . ,s1) (,v2 . ,s2) ...)) env) = (let ((recenv (fix-senv (lambda (renv) `((,v1 . ,(sem s1 renv)) (,v2 . ,(sem s2 renv)) ...))))) (sem s (append recenv env))) The PDF slides contain the definition of a simplified version of fix-senv, called wf-fix-stream. The wf-fix-stream function defines a single recursive stream only, rather than a family of mutually recursive streams. Example ------- Here is an example program. It computes the least significant bit of the N'th element of the Fibonacci sequence. This is not the simplest definition, but is instead designed to use most of the syntax of BitCryptol. First, here is the concrete syntax for the example. fib @@ n swhere { rec fib = nil ## fib-tail and fib-tail = t ## (fcn n. (stail fib @@ n) xor (fib @@ n) } Here is the corresponding abstract ACL2 syntax for the program, assuming (natp N). `(SWhere (At "fib" ,N) (("fib" . (SCons nil "fib-tail")) ("fib-tail" . (SCons t (SLam "n" (Xor (At (SDrop 1 "fib") "n") (At "fib" "n"))))))) Compositional operational semantics ----------------------------------- Since ACL2 does not currently support any kind of function object as a first-class value, we have tried for quite a while to find an operational semantics that is still compositional and can be admitted into ACL2. Our basic approach has been to add an "observation" parameter to SEM, that indicates what finite observation we are interested in when SEM is evaluating an infinitary object, such as a stream. Then, streams and other infinitary objests (such as functions) are stored syntactically in the environment (along with their lexical closure environments), rather than as function values. The chief question then becomes: What measure function do we use to ensure termination? We have tried several, of increasing complexity, without success: * Expression measure: The first measure would be simply to use ACL2-COUNT on the expression argument being evaluated by SEM. However, this measure increases when we lookup the value of a stream definition in the environment, since the size of the expression defining the stream is larger than the variable name. * Expression+environment measure: Our next idea was to store "high water marks" on the stream definitions in the environment. A high water mark represents the maximum index that we will evaluate a stream at. During stream variable lookup, SEM will check to make sure the current observation index is less than the high water mark. If not, then SEM returns nil. Otherwise, it evaluates the expression, after setting the high water mark for the stream to the current observation index. The measure function for any given environment entry would then be based on a lexicographic ordering of a high water mark and the size of the expression. In the case of the expression currently being evaluated by SEM, the measure would correspond to a lexicographic ordering of the current observation and the size of the expression being evaluated. The overall measure would be the sum of the environment entry and current expression measures. However, This measure can't be used for the following reasons: - Evaluating STail expressions causes this measure to increase, since the observation parameter must be incremented when evaluating the argument to STail. - Evaluating SWhere expressions causes the measure to increase, since new stream definitions whose high water marks are all (omega), are added to the current environment. * $\omega$-polynomial measure: We finally tried to devise a recursively-defined measure function on the expression being evaluated, the current observation index, and the environment of definitions, that would decrease when evaluating SWhere expressions. The measure function was defined in terms of {\em $\omega$-polynomial} arithmetic operators, which unlike ACL2's current ordinal operators are strictly monotonic. We hoped that using $\omega$-polynomials would allow us to naturally characterize the depth of nested stream definitions, which would decrease when adding a SWHere expression's local definitions to the environment. The associated PDF slide contains some of the details, but this measure too failed. The problem is that we couldn't determine what the measure of a recursively defined stream definition should be, since the definition contained free stream variables whose measure had not yet been calculated. Current solution ---------------- At this point we have fallen back to performing a shallow embedding of each Cryptol program, where each stream is defined as an ACL2 function. This requires us to performing lambda-lifting, since ACL2 function definitions can't be nested, and to convert stream comprehensions to expressions that manipulate stream indexes explicitly. The advantage of this approach is that it is feasible; the disadvantage is that now evaluators must inspect the ACL2 translation of each every Cryptol program being certified, rather than once-and-for-all inspecting a semantics function for the Cryptol language as a whole. A possible long-term solution: higher order functions in ACL2 ------------------------------------------------------------- Since it is relatively straightfoward to model BitCryptol in higher order logic, I am advocating that ACL2 add some form of first-class functions as new ACL2 objects. However, I am *not* suggesting that we turn ACL2 into a higher order logic! Rather, we could follow the same approach that set theory takes, and continue to have a first order logic with functions as a special kind of data value. We would still have to be careful to avoid the logical paradoxes that can occur when one is allowed to apply a function to itself as an argument. So I'm suggesting that objects in ACL2 have a "rank" associated with them. The normal atoms and cons cells of ACL2 would have rank zero. A function that can take rank zero objects to rank zero objects would have rank one. A function that can take rank zero and rank one objects to rank zero or rank one objects would have rank two, and so on. A function object would be defined to return nil whenever it is applied to a value whose rank is too high. In particular, this would guarantee that applying a function object to itself results in nil. With this model, function objects can easily be modeled as ZFC sets of pairs, which should rule out any logical inconsistancy. Matt Kaufmann has pointed out that we also need to make sure that the axioms comprising function objects form a conservative extension to ACL2, so that they behave nicely with respect to local events and functional instantiation. The PDF slides briefly sketch out one possible formulation of function objects, but others are certainly possible. In fact, Matt Kaufmann has recently started working on a promising alternative proposal for adding function objects to ACL2.