A Formal and Executable Model of the x86 Instruction Set Architecture
These books contain the specification of x86 instruction set architecture (ISA); we characterize x86 machine instructions and model the instruction fetch, decode, and execute process using the ACL2 theorem-proving system. We use our x86 ISA specification to formally verify x86 machine-code programs.
These books include:
For information on how to certify these books, see x86isa-build-instructions.
The utility of a formal ISA model for performing machine-code verification depends directly on the model's completeness (with respect to the ISA features specified), accuracy, and reasoning and execution efficiency. Intel software developer's manuals were used as specification documents, and ambiguities were resolved by cross-referencing AMD manuals and running tests on real machines to understand processor behavior.
The current focus of these books is Intel's IA-32e mode, which includes 64-bit and Compatibility sub-modes, and the 32-bit protected mode.
To see a list of opcodes implemented in these books, see implemented-opcodes.
These books contain a formal x86 ISA model that is not only
capable of reasoning about x86 machine-code programs, but also of
simulating those programs. See program-execution for
instructions on how to set up the
Reasoning efficiency is desirable to make code proofs tractable and execution efficiency is desirable to enable faster model validation via co-simulations (See model-validation).
However, simple definitions that are suitable for reasoning typically offer poor execution performance and definitions optimized for execution efficiency often use a sufficiently different algorithm from the one used in the specification that they are difficult to reason about. However, ACL2 offers several features to mitigate this trade-off between reasoning and execution efficiency. For e.g.: mbe and its friends like mbt, and assert*, ACL2::stobj, and defabsstobj.
ACL2 features like abstract stobjs and
The complexity of the x86 ISA model will increase as more features are added to it, and this added complexity will make reasoning inevitably more involved. The issue of balancing verification effort and verification utility is a very pertinent one. For example, users might not want to reason about an application program at the level of physical memory, i.e., taking into account address translations and access rights management provided by the memory management data structures. This is because it is customary for application programs not to have direct access to the system data structures. The memory model seen by 64-bit application programs is that of linear memory, which is an OS-constructed abstraction of the complicated underlying memory management mechanisms like paging that are based on physical memory. Therefore, verification of application programs can be performed at the level of linear memory, if the OS routines that manage the linear memory abstraction can be either trusted or proved correct. However, the verification of system programs, like kernel routines, must necessarily be done at the level of physical memory since these programs can access/modify system data structures.
In light of the above, the x86 ISA model provides the option to deactivate some features of the ISA, enabling the user to do two kinds of analysis, depending on the kind of programs being considered for verification. Specifically, the model offers the following views of x86 machines:
In this view, the model
attempts to provide the same environment for reasoning as is provided
by an OS for programming. It allows the verification of an
application program while assuming that memory management, I/O
operations, and services offered via system calls are provided
reliably by the underlying OS. The memory model here supports 64-bit
linear addresses specified for IA-32e machines. A specification of
system calls like
This view includes the specification for IA-32e paging and segmentation; in particular, ISA-prescribed data structures for memory management and (partial) specifications of system-mode instructions like LLDT and LGDT are available in this mode. The memory model here characterizes a 52-bit physical address space, which is the largest physical address space provided by modern x86 implementations. This view is intended to be used to simulate and verify software that has supervisor privileges and interacts with I/O devices.
An added benefit of having these two separate views is the increased execution speed of programs in the application-level view; this is because executing these programs in this view does not require simulating both the physical address space (and hence, accesses/updates to the paging data structures) and the linear address space.
It would be beneficial, not to mention interesting, to verify whether the application-level view is an abstraction of the system-level view, given that the system data structures have been set up correctly. As of now, establishing this relationship between the two modes is out of the scope of this project.