Research Topics

I am working on several research topics. These are not presented in any particular order -- I turn my attention to each topic as time permits and as the interests of my students and colleagues proceed.

Much of my work is concerned with using or developing the ACL2 theorem proving system.

X86 Specification in ACL2

We are working on a formal specification for a small subset of the X86 instruction set architecture (ISA). Attempting to specify the entire X86 ISA would require a very substantial effort -- well beyond our available resources. Even so, we believe our subset may have utility to some. Microsoft has been generous enough to provide us with some support to further this effort.

As a starting point, we have develped an ACL2-based specification of the Y86 microprocessor which is described in "Computer Systems, A Programmer's Perspective" by Bryant and O'Hallaron. The Y86 is a very simple microprocessor, with a X86-like ISA; we are using this specification as a starting place for our effort. The source files for the model, whend loaded into ACL2, provide a running simulator of the Y86 microprocessor design.

We believe that developing a formal instruction-set architecture (ISA) specification for the X86 architecture could serve a number of purposes.

  • (1) Documentation to be shared by hardware designers and software developers.
  • (2) Formal verification of RTL designs.
  • (3) Testing of RTL designs through "co-simulation".
  • (4) Simulation for the purpose of testing software.
  • (5) Formal verification of X86 software.

    We believe that such a model should provide a means to specify a multiprocessor configuration, so it will be necessary to evolve a "memory centric" specification that permits interleaved execution. Each copy of an ISA processor specification, whether virtual (as in the case of hyper-threading) or physical (such as with multi-core implementations), must include its own register file, user/supervisor state, interrupt controller, and memory management unit.

    In light of the complexity of the X86 ISA architecture and the sheer amount of work necessary to create such a formal model, it will have to be done in pieces. Here are some obvious elements.

  • (1) Specification of the memory system, and the memory interleaving permitted.
  • (2) Specification of processor state.
  • (3) Specification of the various processor modes.
  • (4) Specification of the integer instructions.
  • (5) Specification of the media instructions.
  • (6) Specification of the floating-point instructions.
  • (7) Specification of supervisor-only instructions.
  • (8) Specification of the memory management mechanisms.
  • (9) Specification of the interrupt controller.

    We have nearly completed elements (1), (2), (3), and (4), including all of the address modes. The allows a skilled ACL2 user to prove the correctness of single-threaded, user-level programs. Our X86 specification can simulate integer-only, user-level programs generated by the GNU and LLVM compilers. We validate our evolving model by co-simulation against real X86 processors. Our X86 specification project has its own Website, where there is information about our progress.

    Hash CONS, Function Memoization, Fast Association Lists

    Bob Boyer and I have been working on an implementation of Hash CONS for use within ACL2. This effort extends the core capabilities of the ACL2 system.

    We have developed an extension of ACL2 that includes the implementation of hash-based association lists and function memoization; this makes some algorithms execute more quickly. This extension, enabled partially by the implementation of Hash-CONS, represents ACL2 objects in a canonical way, thus the comparison of any two such objects can be determined without the cost of descending through their CONS structures. A restricted set of ACL2 user-defined functions may be memoized; the underlying implementation may conditionally retain the values of such function calls so that if a repeated function application is requested, a previously computed value may instead be returned. We have defined a fast association list access and update functions using hash tables. We provide a file reader that identifies and eliminates duplicate representations of repeated objects, and a file printer that produces output with no duplicate subexpressions.

    We have a small collection of other papers related to Hash CONS. If you believe a good reference is missing, please send us a pointer.

    Formal Hardware Description Languages

    We formalized the DE2 hierarchical, occurrence-oriented finite state machine (FSM) language, and have developed a proof theory allowing the mechanical verification of FSM descriptions. Using the ACL2 functional logic, we have defined a syntactic well-formedness predicate and a symbolic simulator that defines the DE2 cycle-based simulation semantics. DE2 is deeply embedded within ACL2, and the DE2 language includes an annotation facility that can be used by programs that manipulate DE2 descriptions; this facility may restrict the use of defined modules or it may provide other module information. The DE2 user may write and prove the correctness of programs that generate DE2 descriptions.

    Erik Reeber (mostly) and I have been developing this language as a vehicle to explore how we should extend current hardware description languages to make them more useful to the design process. For instance, the DE2 language permits a number of embedded annotations -- these annotations are first class parts of a design description and they can be analyzed just like the functional specification part.

    DE2 is designed to permit the rigorous hierarchical description and hierarchical verification of finite-state machines (FSMs). We call our language DE2 (Dual-Eval 2) because of the two-pass approach that we employ for the language recognizers and evaluators. DE2 is actually a general-purpose language for specifying FSMs; users may define their own language primitives. We recognize valid DE2 descriptions with an ACL2 predicate that defines the permissible syntax, names, and hierarchy, of valid descriptions. The DE2 language also provides a rich annotation language that can be used to enforce syntactic and semantic design restrictions.

    Erik Reeber and I describe the DE2 language in our paper the ``Formalization of the DE2 Language'', which appears in Correct Hardware Design and Verification Methods (CHARME 2005), Lecture Notes in Computer Science}, No. 3725, pages 20-34, © Springer-Verlag, 2005.

    We have also have built a verifying compiler that translated DE2 circuits into cycle-accurate ACL2 models. It is a verifying compiler in that it also produces an ACL2 proof that the ACL2 model is equivalent to the DE2 circuit, with respect to the DE2 semantics, formalized in ACL2. Properties regarding a finite number of steps of the ACL2 model can be written in the Subclass of Unrollable List Formulas in ACL2 (SULFA), for which we have developed a SAT-based decision procedure. We describe SULFA and this decision procedure in our paper "A SAT-Based Decision Procedure for the Subclass of Unrollable List Formulas in ACL2 (SULFA)" (abstract), which will appear in the 3rd International Joint Conference on Automated Reasoning (IJCAR 2006).

    We have used our methodology to mechanically verify components of the TRIPS microprocessor implementation. We compiled Verilog components into DE2, then used our verifying compiler to produce an equivalent ACL2 model. We then used a mixture of theorem proving and our SAT based decision procedure to verify that the ACL2 model satisfies its ACL2 specification.

    Computational Biology -- TASPI

    Serita Nelesen and I are developing the TASPI experimental system for analysis, retrival, and storage of phylogenetic information. TASPI is implemented as an ACL2 program.

    Phylogenetic tree searching algorithms often produce thousands of trees which biologists save in Newick format in order to perform further analysis. Unfortunately, Newick is neither space efficient, nor conducive to post-tree analysis such as consensus. We propose a new format for storing phylogenetic trees that significantly reduces storage requirements while continuing to allow the trees to be used as input to post-tree analysis. We implemented mechanisms to read and write such data from and to files, and also implemented a consensus algorithm that is faster by an order of magnitude than standard phylogenetic analysis tools. We demonstrate our results on a collection of data files produced from both maximum parsimony tree searches and Bayesian methods.

    In 2005, we (Bob Boyer, Serita Nelesen, and I) published the paper "A Compressed Format for Collections of Phylogenetic Trees and Improved Consensus Performance"; this paper appeared in Algorithms in Bioinformatics: 5th International Workshop, WABI 2005. Lecture Notes in Computer Science 3692, pages 353-364, Springer-Verlag, 2005.

    Integration of ACL2 and HOL4

    Mike Gordon and I started an experimental ACL2-HOL4 integration project in the summer of 2005, when we constructed a means of embedding the ACL2 logic in HOL4. This was done by defining a model of ACL2 datatypes (meaning a single type that can represent ACL2 symbols, characters, numbers, strings, and CONS (pairing) operations inside of HOL; we then defined a translator from this level of representation from HOL to ACL2 and back again.

    Matt Kaufmann was quickly drafted to help us with mechanizing the definition of all ACL2 primitive functions; Matt created a tool that prints the definition of the primitive ACL2 functions into HOL4. James Reynolds has been using the complete system to verify some properties of HOL definitions by first translating them into ACL2, and then using the ACL2 theorem-proving system.

    Mike Gordon has a Webpage where interested parties may investigate this effort.

    Transistor-level Circuit Analysis

    Sol Swords and I are working on analysis of transistor-level circuits. Currently, we have created an analysis program similar in character to Bryant's COSMOS system . Our first goal is to make a smoothly functioning analysis system that includes all of the analysis features found in COSMOS. Our long-term goal is to improve the analysis capabilities of our system beyond those of COSMOS.

    We express the circuit as graph of wires connected by transistors. We break th graph into channel-connected subcomponents for piecewise analysis. We use Gaussian elimination to find three-valued formulas represented as BDDs expressing the steady state behavior of each wire. This model views the wires between transistors as having one of three values (TRUE, FALSE, UNKNOWN) instead of a voltage. The model includes provisions for transistors of different strengths and wire capacitances; it analyzes off-sets and on-sets separately in order to conservatively determine when a wire will have a good Boolean state as opposed to an indeterminate value. Once a formula is found for each wire, we symbolically simulate the resultant system using a unit-transistor-delay model.

    Verification of Control Systems

    For a Hybrid system, we assume the system is described by one or more systems of differential equations and some criteria for switching from one system of equations to the other or instantaneously modifying the system variables. Of particular interest is when the switching of the system of equations or the change of variable value is dictated by the output of some computer which is said to be the controller of the system. Our interest is to research methods for reasoning about the semantics of the computer program as well as the behavior of the dynamic system. With the use of nonstandard analysis, we are using proof methods often used for reasoning about computer programs to reason about continuous dynamic systems interacting with computer systems. The formal verification of control sytems using ACL2(R) is the subject of Shant Harutunian's PhD research; he is an ECE PhD Candidate.

    A simple switching system is a bouncing ball. The position y of the ball perpendicular to the impact surface may be described by a differential equation: dy/dt = v0-(a*t), where the impact surface is at y=0, the initial ball velocity is v0, acceleration due to gravity is a, and time is t. The solution to this equation is y(t)=(v0*t)-(0.5*a*(t^2)), where y(0)=0. The impact of the ball with the impact surface causes the vertical velocity of the ball to instantaneously change its sign. While the solution to the differential equation describes the behavior of the ball prior to impact, the concatenation of the system solutions initialized with the velocity immediately after impact describes the behavior of the ball over time. Hence, given a system solution y(v0,t), and given a sequence S, not necessarily finite, of tuples where each tuple is a velocity and time immediately after impact, one may then determine the position of the ball over time. However, one must derive, based on the dynamics of the system, this sequence S. This sequence may be generated recursively if we define a function such that given a tuple of velocity and time, returns the tuple of the velocity and time of the next element in the sequence, where the time t is the duration between successive impacts. For this case, the function may be found by solving for the time t in the next tuple based on the roots of the equation y(t)=(v0*t)- (0.5*a*(t^2)), and the next velocity using this next time. Assuming positive time and initial velocity v0, given a tuple (v0,t), we let the next tuple (vn,tn) = (e*v0, 2*v0/a), where 0<=e<=1 is the coefficient of restitution. To determine the reachability of some state satisfying a predicate, the sequence S and the system dynamics must be studied jointly. We have found formal logic as being a useful tool in precisely and formally describing and reasoning about such system behavior. If one wishes to reason that the ball, with 0 < e < 1 and an initial non-zero velocity, then the ball eventually reaches a state with zero velocity and position, then one would have to reason about an infinite sequence S, even though the time required to reach this state is finite. While the use of real analysis and limits are useful for reasoning about this behavior, nonstandard analysis provides a useful, formal framework within which to reason that the ball reaches zero velocity and position, after infinite in-elastic impacts, but finite time.

    In our research, we use nonstandard analysis to model the physical dynamics of the system as well as the switching agent or computer controller which causes a change in the system dynamics. By the use of nonstandard analysis, the real time physical system is discretized in accordance to an infinitesimal time step epsilon. However, predicates over the state space are also discretized according to this epsilon. To reason about properties of the dynamic system, such as stability, we turn to the use of invariants and measure functions, which are commonly employed in program verification techniques. Once we prove these necessary properties, by use of the principals of nonstandard analysis, we may then conclude the property about the overall hybrid system over real time.

    We have applied this method for a simple system consisting of a computer controller, and a widget, whereby the state variable x of the widget is modified, with the constraint that the state variable can only be modified by a limited, finite amount in a time duration, and the computer is allowed to sample this value only at discrete time points. Furthermore, while the variable x is real valued, the computer is assumed to be integer valued, taking the floor of the variable x. Using our method, we are able to prove that an algorithm for adjusting the state variable x is such that 1) it causes the widget to eventually reach a sate variable value x within 2.5 of the requested value and 2) it remains within this margin of error. This approach is instructive in demonstrating the use of formal logic and nonstandard analysis in reasoning about a simple real time dynamic system and computer program.

    Improved Arithmetic Procedures for ACL2

    Robert Krug (mostly), J Moore, and I have worked on improving the arthmetic proving procedures in ACL2 in several ways: new ACL2-system code to better support processing of arithmetic expression, new ACL2-system code to support automatic reasoning about arithmetic expressions involving non-linear terms, and new ACL2 libraries (books) that, when included, often decrease the effort required to prove arithmetic facts.

    Some Older Research Topics

    The FM9001 Microprocessor Project

    The FM9001 microprocessor was the first fabricated microprocess with a mechanically-checked proof of its gate-level design. Bishop Brock and I developed this microprocessor and the related technology. To verify the FM9001 design, we formally specified the ISA, we formalized the DUAL-EVAL hardware description langauage (HDL), we specified the FM9001 design in the DUAL-EVAL HDL, and finally, we mechanically verified that our design did indeed implement its ISA-level specification. We have written summary of this effort, which both describes the FM9001 processor as well as the HDL used to implement the FM9001 design.