A Mechanized Program Verifier
J Strother Moore
Let me begin by apologizing for my late arrival. The
faculty recently selected me as one of their three
representatives on a committee to search for the next
President of University of Texas at Austin. Serving on this
committee is crucial to the future of my department. But
the governing board stipulated that members of the committee
must attend every meeting. One of those meetings was held
day before yesterday. So I got here as fast as I could.
Department of Computer Sciences
University of Texas at Austin
The Verification Grand Challenge Workshop
October 10--13, 2005
We seek a grand challenge related to software verification -- a task that is
supposed to be at once daunting and achievable, practical and inspiring.
But how can software verification be inspiring?
Let's take a step back, to the roots of our subject:
The computing revolution is just that: a revolution. The
old order is being replaced -- everywhere -- by new
paradigms that fundamentally exploit our ability to compute.
Slide 1 - 19th Century Factory Workers
The industrial revolution augmented the human body --
it magnified what hands and arms could do.
The computing revolution is augmenting the human mind.
Slide 2 - Colossus, Bletchley Park, England
It was not a coincidence that Colossus was designed to
process information -- to help a person think.
Some of this revolution is occurring because people are
applying computing to well understood mathematical tasks --
simulating sets of differential equations. We are
witnessing the virtual recapitulation of the 19th century.
These in silico experiments enable scientists to test
hypotheses they could not previously test -- to conduct
experiments they could not previously conduct:
What will tomorrow's weather bring?
How will traffic flow between these cities?
How will the planet's temperature change over the next
Will these two molecules dock?
Some of you may remember theorem proving in the late 1960s
and early 1970s. I do. Computing allowed us to
``recapitulate'' some of the fundamental ideas of Turing,
von Neumann and others. We soon learned what an exponential
search space was REALLY like and how sparsely it was
populated by interesting theorems. We had to develop new
algorithms and heuristics. And in so doing we helped invent
some fundamental computer science, like garbage collection,
memory management schemes, unification, resolution,
Davis-Putnam-Loveland, Lisp, Prolog, etc.
So when I say that scientists are using computing to
``recapitulate the 19th century'' do not get me wrong --
they are often developing fundamental new algorithms,
representations, and strategies that are part of computer
These applications of computing are of great importance.
They allow us to think in new ways about weather, economic
planning, biology, etc. They expand what we can think
about. They aid the thinker the way a microscope or
telescope does -- they expand the human eye and, in fact,
can provide an instrument for peering through time to
hypothetical pasts and futures, to compress or expand time
so we comprehend the changes we can now witness.
But does this style of computing fundamentally augment the
mind itself? Can we say that these applications reason?
I say they do not.
Slide 3 - Sparsely Inhabited Coast
If the augmentation of the human mind were a dark continent
to be explored, then such applications are on the coast.
I say let us explore.
I say let us mount an expedition.
Let us go deeply into the heart of the continent. We
will not map the whole thing -- only a narrow path. But
let's get off the coast. Let's build a track into the
Slide 4 - A Path to the Interior
Let's build a machine that reasons.
What do I mean by ``reason''? I mean to deduce the
consequences of more or less arbitrary assumptions.
Here's my suggestion. Let's build a system that can follow
-- effortlessly -- the reasoning of a single talented human
being engaged in some deep problem solving task. So ... the
machine must be much more powerful than today's mechanical
The problem solving task will require great creativity on
the part of the human. So ... the machine must be able to
follow -- if not discover -- those creative leaps.
The task will require the human to draw upon a large body of
knowledge from many different intellectual disciplines. So
... the machine must be able to read and ``understand'' that
body of knowledge.
The task will require the human to invent new concepts
specific to the task at hand. So ... the machine must deal
with some general purpose language in which arbitrary new
concepts can be introduced and manipulated.
The task will require the human to interact with other
similarly augmented people. So ... the machine must be
flexible and able to reject plausible solutions for reasons
outside its sphere of expertise.
But we're not tackling the AI problem. We're not trying to
map the whole dark continent. Remember, we're building a
mere path -- a track -- a trail -- into the interior.
To give us hope that we can succeed, let's impose one
constraint on the problem solving task the human is working
on: the solution is to be expressed as a piece of software.
What I've just described is a program verification system.
Consider the people in this room... who is better equipped
to build a machine that can reason?
Who has a subject domain better suited for this challenge
than software is? A subject domain that is as important as
software -- but as limited as software. A subject domain that
can be accurately described in a formal language.
And -- since the authors of the first reasoning system will
undoubtedly be computer scientists -- what is the subject we
know the most about? Software.
Finally, consider the users of the first interesting
Theorem four_color : (m : (map R))
(simple_map m) -> (map_colorable (4) m).
Slide 5 - The Coq Statement of the Four Color Theorem
They will have to speak the same language as the reasoning
system, to communicate with it. That will not be a natural
language -- it will not be English or French -- at least not
for a long time. It will be an artificial language. Is
there an audience more likely to accept an artificial
language than our potential users -- the programmers of the
These questions suggest the obvious -- if anybody can build
a machine that is said to reason -- at least a machine that
is said to reason WELL about something that is important to
many people -- it is this group, the people in this room.
In my contributed paper I did not try to put this argument
in inspirational terms. I wrote:
The verification community seeks to build a system that can
follow the reasoning of an average human engaged in a
creative, computational, problem solving task lasting
months or years and spanning an arbitrary set of
We should aim for the specification and machine checked
proof of full functional correctness of programs of
inherent importance and interest.
Now step back and reconsider what was just said: we seek to
build a system that follows the reasoning of an average
human in an arbitrary, creative, computational,
This is the theorem proving problem, perhaps restricted to
a computational logic. We seek to build a system that
reasons about computation.
I believe one is not likely to achieve a goal unless one
identifies precisely what the goal is. My goal is to build
a practical theorem prover for a computational logic. I
believe that any attack on the verification problem will
fail unless theorem proving -- machine assisted reasoning
-- is at its heart.
In order for a program to be capable of learning something
it must first be capable of being told it.
Slide 6 - John McCarthy in ``Programs with Common Sense''
otherwise known as ``The Advice Taker paper'', 1959.
Someday we will see a system that can discover why a piece
of software is correct. But you have to learn to walk
before you learn to run. We should aim for a system that
merely understands why our software is correct. We should
aim for a system to which we can explain our software -- and
a system in which a suitable explanation can be produced in
an acceptable amount of time.
These may seem like modest goals. But nobody -- certainly
nobody in this room -- believes we're there. Is your system
that good? Is mine?
Let me tell you a story about a proof. You can find a technical paper about
this proof, by Qiang Zhang and me, in the 2005 proceedings of TPHOLS.
Slide 7 - A Graph with Some Explored Paths
I decided to prove the correctness of Edsger Dijkstra's shortest path
algorithm, using ACL2, the theorem prover developed by Matt Kaufmann and me.
Of course, I knew the proof. By the time I sat down at the keyboard with
ACL2, I had thought about the proof occasionally for several days.
I gave myself a weekend, when my wife was away. I started
Saturday morning. Ten hours later, I was not done. I quit
for the day, frustrated by how many obvious facts I had to
prove and how stupid the theorem prover was. Every fact had to
be spelled out. So I quit for the day -- that is, I
quit sitting at the keyboard. But of course, by then I had so
internalized the problem that I thought about it while I
made dinner. I thought about while I ate. I thought about
while I watched TV. I thought about while I slept.
I sat down at the keyboard again the next morning. Within a
few hours I was so frustrated that I got up and went
outside to do some gardening.
It was summer -- in Texas -- the sun was blazing --- the
temperature was about 36 degrees C. There was a bush in our
garden that we wanted removed. Using an ax, I chopped it
When I had finished -- exhausted -- I went back into the
house, I took a shower, I sat down, and I finished the proof
in a few more hours.
Slide 8 - A Desert Plant (but not in my garden!)
So what does this story tell us about the experience of an
expert using a state-of-the-art theorem prover to verify a
classic algorithm designed by a man who valued simplicity
above all else?
It tells us that the experience is painful and frustrating
compared to chopping out bushes in the blistering heat of
a Texas summer.
The state of the art is still pretty poor!
Those of us in the so-called ``Boyer-Moore'' community have
been trying to build an empowering program verification
system for 35 years.
In my paper I describe some of the basic assumptions of the Boyer-Moore
community and I provide evidence for why they may permit the construction of
an empowering program verification system. I think of our work as an
existence proof: it is possible to build a program verification system
powerful enough to permit its adherents to use it to build the software they
Here is a chart I prepared in 1996, when I was interviewing
for a job at UT Austin. At the time, I worked at
Computational Logic, Inc.
Slide 9 - Boyer-Moore Project 1971-96
It starts when Boyer and I met in 1971 and has some of the
highlights of our work and our students' work.
The highlight of 1971 was the associativity of append.
In 1973, it was insertion sort.
And so it goes.
In 1988, it was the Piton assembler.
In 1990, it was the full CLI stack.
In 1993, it was the Motorola CAP digital signal processor.
In 1995, it was floating point division on the AMD K5 -- before fabrication.
Slide 10 - Some Companies where ACL2 Has Been Used
Since 1996, we've done software verification work for a
number of companies -- verifying code that is crucial to the
correctness of products.
Slide 11 - High-Level Architecture of ACL2
Here is a picture that shows the key parts of the system:
the user, the theorem prover, and the data base of proved
But one of the most important decisions we made was to build
a system that we could imagine could be used to verify its
own important properties.
Slide 12 - High-Level Architecture of ACL2 (revealed)
Don't get hung up in the apparent logical challenges. I'm
not suggesting that we build a system that can correctly establish its
I mean we should build a system in some programming language
for which we have a mechanically supported formal semantics
and a mechanically supported reasoning engine -- and that we
be able at least to state within the system what it means for
our system to be correct.
If you are working on a smaller piece of the problem -- if
you are building a system whose expressive power and
implementation is beyond the scope of your own work -- then
you should find somebody to deal with the problem you are
creating! When that process reaches transitive closure, we
will have a community working on a grand challenge.
Another good lesson of the Boyer-Moore community is that we
should present the user of our system with only one
language. That language, of necessity, must be both an
executable programming language and a mathematical logic.
In my view, this eliminates a lot of cognitive dissonance
and simplifies the problem. In our case, the language is
the functional subset of ANSI standard Common Lisp.
But the particular language is unimportant -- as long as it
is seamlessly integrated into the reasoning engine so that
no other language is ever encountered.
Code in that language is executed to perform the
computations the user cares about.
This is an incredibly important point so I will say it
Code in that language is executed to perform the
computations the user cares about.
One implication is that the execution environment must be
efficient, rugged, and scalable -- capable of dealing with
data of practical size and complexity. If the
straightforward way of expressing an algorithm is too
inefficient, the user must have recourse to other ways of
expressing it and it must be possible to reason with the
system to show that the efficient expression performs the
I personally regard it as a failure of our whole school of
thought to have to escape from the reasoning system to do a
computation efficiently enough. Such dilemmas have forced
us, time and time again, to extend the language supported by
the reasoning tool.
We have verified code that deals with gigabytes of data.
We have verified code that manipulates memory reads and
writes at nearly C speeds.
The world's first silicon Java Virtual Machine was first modeled in ACL2.
And the model executes at 90 percent of the speed of the C model.
Slide 13 - Rockwell-Collins / aJile Systems JEM1
Our models are compared to other engineering models by simulating
them side-by-side. In these comparisons, the formal models are
executed millions and millions of times.
Slide 14 - AMD Floating Point Design Process
Executability is crucial. And remember, I do not mean just
the theoretical burden of reducing ground terms to
constants. I mean the practical burden of doing it
efficiently. I mean what I said before:
the language is used to perform the computations you care
The unified language approach also means that the properties
that the user verifies are expressed in the language.
Here I have to clarify our rules a little -- and let me be
clear that these are the rules I chose to follow, not ones I'm
insisting you follow. There are
some constructs in the language that cannot be
executed. In our system some function symbols
are merely constrained by axioms. These constrained symbols
allow the user to express existential quantification and to
specify systems without over-specifying them.
So every expression in the language is not actually an
executable program. But every executable program is an
expression in the language.
The unified language approach also means that to interact
with the reasoning engine, the user writes expressions in
the language. This means that hints, guidance, extensions,
debugging information, etc., are all communicated via the
In our system, this is helped by the fact that the entire
reasoning environment -- including the engine itself -- is
written in the language. Thus, its internal data structures
are just constants in the language and everything is visible
and manageable by the user who speaks the language.
This has implications for the use of external tools.
Ideally, they should be implemented in the same environment
so that -- to the extent that they are open to the user --
their internal data structures can be understood.
I know this flies in the face of much common wisdom about
modularity and software system design. But it helps
illustrate, scientifically, an argument for why this is a
grand challenge: we must find a system design methodology
that permits a seamless, fully integrated system to be built
by a diverse group of people.
This also highlights why we are the people to do this: if we
are right that formal specification and proof provide a
practical way to build reliable software, then we
ought to be able to use our tools to build the tools we
Finally, it will be necessary from time to time to escape
the system and descend to a lower level language -- or even
to the machine architecture.
This can be done. It was done in the CLI stack about 20 years ago!
Slide 15 - The CLI Verified Stack (OS kernel and applications not shown)
Let me pause here. If you don't know about the verified
stack of system components built by Computational Logic,
Inc, in the late 1980s, you should. If your students don't
know about it, they should. In my view, you cannot hope to
deal with software verification unless you can deal with
multiple layers of abstraction in a practical way. The CLI
stack moved from a micro-architectural view of a fabricated
microprocessor, through the ISA, an assembler, and two
compilers, to high level applications and an operating
system kernel -- all verified mechanically. I cannot tell
you how many students I have met around the United States
who are working on one piece of that problem in complete
ignorance of what has already been done.
Returning to the issue of mapping to another programming
language, the obvious approach works if your tools are good
enough: formalize the semantics operationally in your system
and prove theorems about constants representing programs in
the other language.
The paper contains many many references to successful
examples of this.
You may think that deep embedding is unnecessarily awkward
and burdensome. I don't believe so. In the first place,
the reasoning capability required for interpreter-style
proofs is no different than any other. It just stresses the
scale and performance -- but we're already committed to
In the second place, it positions the project perfectly for
the step of enlarging the scope of the reasoning engine to
deal with other programming languages: all that is required
is to formalize the semantics of those languages.
In a paper published last year in the Journal of the Spanish Academy of
Science, Matt Kaufmann and I discussed the eight key research problems
we see standing in the way of the grand challenge.
1. Automatic Invention of Lemmas and New Concepts
This is the problem. How do we invent proofs --
especially inductive proofs?
2. How to use Examples and Counterexamples
I believe this problem is part of the solution of the
first one. I believe people frequently use examples to
guide the search for a proof -- and more often use
examples to guide the search for accurately stated
theorems and inductive invariants.
3. How to use Analogy, Learning, and Data Mining
How do we learn new concepts? How do we come to
understand their patterns of motion in terms? Proof by
analogy is often a reasonable substitute for
higher-order features. That is, instead of stating a
higher-order general scheme, prove the instances
automatically as needed. How do we learn what moves are
likely to work in a given context and what moves won't?
4. How to Architect an Open Verification Environment
In any sufficiently automatic system, the system's
strategy will occasionally thwart the proof strategy the
user has in mind. It must be possible to over-ride
virtually every automatic decision -- in a sound way.
This is also the place where we will address the problem
of plugging together different tools -- other provers,
decision procedures, static analyzers, etc.
5. Parallel, Distributed and Collaborative Theorem Proving
Big proof projects involve many people working on
interconnected parts of the problem. Big projects
require ``proof maintenance'' where previously
constructed proofs must be re-constructed for slightly
different models and specs. How do we support
distributed proof development and maintenance?
6. User Interface and Interactive Steering
How do we combine a fully automatic proof search engine
with one that allows the user to supply proof sketches,
hints, and strategic guidance? How do we inform the
user of what the system is doing so the user can help --
how can we do that without drowning the user in a sea
7. Education of the User Community -- and Their Managers
Even if we succeed in building an empowering
verification environment, we have to motivate and
educate people to use it.
8. How to Build a Verified Theorem Prover
At first sight, this goal may not seem to be on the
critical path. However, it is difficult to sell a
product you don't use -- and that includes formal
Remember, this paper asks the questions. It doesn't answer them.
In summary, my main advice is simply to accept the grand
challenge of formal methods itself.
Build only those things you can imagine specifying and
verifying -- and if your group cannot handle its own
system, then find some partners who can build the part you
Our community is perfectly positioned as the vanguard of the
revolution. If we accept the challenge, we are likely to be
the first group to build a machine that reasons.
All of you using mechanized deduction engines have
experienced the thrill of having a machine tell you that you
were wrong about something you firmly believed!
You may have experienced the elation of having a machine
find a proof before you did.
You have felt the relief of knowing that the burden of
soundness is on the machine.
You were the first in the world to know what it feels like
to have a machine to reason with -- to educate, to explain
to, to argue with, to be enlightened by.
If we accept the grand challenge we could bring that
experience to millions of programmers and truly begin to
revolutionize human thought.