A Mechanized Program Verifier

J Strother Moore
Department of Computer Sciences
University of Texas at Austin

presented at

The Verification Grand Challenge Workshop
Zurich
October 10--13, 2005

Let me begin by apologizing for my late arrival. The faculty recently selected me as one of their three representatives on a committee to search for the next President of University of Texas at Austin. Serving on this committee is crucial to the future of my department. But the governing board stipulated that members of the committee must attend every meeting. One of those meetings was held day before yesterday. So I got here as fast as I could.

We seek a grand challenge related to software verification -- a task that is supposed to be at once daunting and achievable, practical and inspiring.

But how can software verification be inspiring?

Let's take a step back, to the roots of our subject: computing.

The computing revolution is just that: a revolution. The old order is being replaced -- everywhere -- by new paradigms that fundamentally exploit our ability to compute.

Slide 1 - 19th Century Factory Workers

The industrial revolution augmented the human body -- it magnified what hands and arms could do.

The computing revolution is augmenting the human mind.

Slide 2 - Colossus, Bletchley Park, England

It was not a coincidence that Colossus was designed to process information -- to help a person think.

Some of this revolution is occurring because people are applying computing to well understood mathematical tasks -- simulating sets of differential equations. We are witnessing the virtual recapitulation of the 19th century. These in silico experiments enable scientists to test hypotheses they could not previously test -- to conduct experiments they could not previously conduct:

What will tomorrow's weather bring?

How will traffic flow between these cities?

How will the planet's temperature change over the next century?

Will these two molecules dock?

Some of you may remember theorem proving in the late 1960s and early 1970s. I do. Computing allowed us to ``recapitulate'' some of the fundamental ideas of Turing, von Neumann and others. We soon learned what an exponential search space was REALLY like and how sparsely it was populated by interesting theorems. We had to develop new algorithms and heuristics. And in so doing we helped invent some fundamental computer science, like garbage collection, memory management schemes, unification, resolution, Davis-Putnam-Loveland, Lisp, Prolog, etc.

So when I say that scientists are using computing to ``recapitulate the 19th century'' do not get me wrong -- they are often developing fundamental new algorithms, representations, and strategies that are part of computer science.

These applications of computing are of great importance. They allow us to think in new ways about weather, economic planning, biology, etc. They expand what we can think about. They aid the thinker the way a microscope or telescope does -- they expand the human eye and, in fact, can provide an instrument for peering through time to hypothetical pasts and futures, to compress or expand time so we comprehend the changes we can now witness.

But does this style of computing fundamentally augment the mind itself? Can we say that these applications reason?

I say they do not.

Slide 3 - Sparsely Inhabited Coast

If the augmentation of the human mind were a dark continent to be explored, then such applications are on the coast.

I say let us explore.

I say let us mount an expedition.

Let us go deeply into the heart of the continent. We will not map the whole thing -- only a narrow path. But let's get off the coast. Let's build a track into the interior.

Slide 4 - A Path to the Interior

Let's build a machine that reasons.

What do I mean by ``reason''? I mean to deduce the consequences of more or less arbitrary assumptions.

Here's my suggestion. Let's build a system that can follow -- effortlessly -- the reasoning of a single talented human being engaged in some deep problem solving task. So ... the machine must be much more powerful than today's mechanical expert assistants.

The problem solving task will require great creativity on the part of the human. So ... the machine must be able to follow -- if not discover -- those creative leaps.

The task will require the human to draw upon a large body of knowledge from many different intellectual disciplines. So ... the machine must be able to read and ``understand'' that body of knowledge.

The task will require the human to invent new concepts specific to the task at hand. So ... the machine must deal with some general purpose language in which arbitrary new concepts can be introduced and manipulated.

The task will require the human to interact with other similarly augmented people. So ... the machine must be flexible and able to reject plausible solutions for reasons outside its sphere of expertise.

But we're not tackling the AI problem. We're not trying to map the whole dark continent. Remember, we're building a mere path -- a track -- a trail -- into the interior.

To give us hope that we can succeed, let's impose one constraint on the problem solving task the human is working on: the solution is to be expressed as a piece of software.

What I've just described is a program verification system.

Consider the people in this room... who is better equipped to build a machine that can reason?

Who has a subject domain better suited for this challenge than software is? A subject domain that is as important as software -- but as limited as software. A subject domain that can be accurately described in a formal language.

And -- since the authors of the first reasoning system will undoubtedly be computer scientists -- what is the subject we know the most about? Software.

Finally, consider the users of the first interesting reasoning system.

Theorem four_color : (m : (map R))
(simple_map m) -> (map_colorable (4) m).

Slide 5 - The Coq Statement of the Four Color Theorem

They will have to speak the same language as the reasoning system, to communicate with it. That will not be a natural language -- it will not be English or French -- at least not for a long time. It will be an artificial language. Is there an audience more likely to accept an artificial language than our potential users -- the programmers of the world?

These questions suggest the obvious -- if anybody can build a machine that is said to reason -- at least a machine that is said to reason WELL about something that is important to many people -- it is this group, the people in this room.

In my contributed paper I did not try to put this argument in inspirational terms. I wrote:

The verification community seeks to build a system that can follow the reasoning of an average human engaged in a creative, computational, problem solving task lasting months or years and spanning an arbitrary set of application domains.

Now step back and reconsider what was just said: we seek to build a system that follows the reasoning of an average human in an arbitrary, creative, computational, problem-solving task.

This is the theorem proving problem, perhaps restricted to a computational logic. We seek to build a system that reasons about computation.

I believe one is not likely to achieve a goal unless one identifies precisely what the goal is. My goal is to build a practical theorem prover for a computational logic. I believe that any attack on the verification problem will fail unless theorem proving -- machine assisted reasoning -- is at its heart.

We should aim for the specification and machine checked proof of full functional correctness of programs of inherent importance and interest.

In order for a program to be capable of learning something it must first be capable of being told it.

Slide 6 - John McCarthy in ``Programs with Common Sense'' otherwise known as ``The Advice Taker paper'', 1959.

Someday we will see a system that can discover why a piece of software is correct. But you have to learn to walk before you learn to run. We should aim for a system that merely understands why our software is correct. We should aim for a system to which we can explain our software -- and a system in which a suitable explanation can be produced in an acceptable amount of time.

These may seem like modest goals. But nobody -- certainly nobody in this room -- believes we're there. Is your system that good? Is mine?

Let me tell you a story about a proof. You can find a technical paper about this proof, by Qiang Zhang and me, in the 2005 proceedings of TPHOLS.

Slide 7 - A Graph with Some Explored Paths

I decided to prove the correctness of Edsger Dijkstra's shortest path algorithm, using ACL2, the theorem prover developed by Matt Kaufmann and me. Of course, I knew the proof. By the time I sat down at the keyboard with ACL2, I had thought about the proof occasionally for several days.

I gave myself a weekend, when my wife was away. I started Saturday morning. Ten hours later, I was not done. I quit for the day, frustrated by how many obvious facts I had to prove and how stupid the theorem prover was. Every fact had to be spelled out. So I quit for the day -- that is, I quit sitting at the keyboard. But of course, by then I had so internalized the problem that I thought about it while I made dinner. I thought about while I ate. I thought about while I watched TV. I thought about while I slept.

I sat down at the keyboard again the next morning. Within a few hours I was so frustrated that I got up and went outside to do some gardening.

It was summer -- in Texas -- the sun was blazing --- the temperature was about 36 degrees C. There was a bush in our garden that we wanted removed. Using an ax, I chopped it out.

When I had finished -- exhausted -- I went back into the house, I took a shower, I sat down, and I finished the proof in a few more hours.

Slide 8 - A Desert Plant (but not in my garden!)

So what does this story tell us about the experience of an expert using a state-of-the-art theorem prover to verify a classic algorithm designed by a man who valued simplicity above all else?

It tells us that the experience is painful and frustrating compared to chopping out bushes in the blistering heat of a Texas summer.

The state of the art is still pretty poor!

Those of us in the so-called ``Boyer-Moore'' community have been trying to build an empowering program verification system for 35 years.

In my paper I describe some of the basic assumptions of the Boyer-Moore community and I provide evidence for why they may permit the construction of an empowering program verification system. I think of our work as an existence proof: it is possible to build a program verification system powerful enough to permit its adherents to use it to build the software they want.

Here is a chart I prepared in 1996, when I was interviewing for a job at UT Austin. At the time, I worked at Computational Logic, Inc.

Slide 9 - Boyer-Moore Project 1971-96

It starts when Boyer and I met in 1971 and has some of the highlights of our work and our students' work.

The highlight of 1971 was the associativity of append.

In 1973, it was insertion sort.

And so it goes.

In 1988, it was the Piton assembler.

In 1990, it was the full CLI stack.

In 1993, it was the Motorola CAP digital signal processor.

In 1995, it was floating point division on the AMD K5 -- before fabrication.

Slide 10 - Some Companies where ACL2 Has Been Used

Since 1996, we've done software verification work for a number of companies -- verifying code that is crucial to the correctness of products.

Slide 11 - High-Level Architecture of ACL2

Here is a picture that shows the key parts of the system: the user, the theorem prover, and the data base of proved theorems.

But one of the most important decisions we made was to build a system that we could imagine could be used to verify its own important properties.

User, Theorem Prover as Data and Database

Slide 12 - High-Level Architecture of ACL2 (revealed)

Don't get hung up in the apparent logical challenges. I'm not suggesting that we build a system that can correctly establish its own correctness.

I mean we should build a system in some programming language for which we have a mechanically supported formal semantics and a mechanically supported reasoning engine -- and that we be able at least to state within the system what it means for our system to be correct.

If you are working on a smaller piece of the problem -- if you are building a system whose expressive power and implementation is beyond the scope of your own work -- then you should find somebody to deal with the problem you are creating! When that process reaches transitive closure, we will have a community working on a grand challenge.

Another good lesson of the Boyer-Moore community is that we should present the user of our system with only one language. That language, of necessity, must be both an executable programming language and a mathematical logic.

In my view, this eliminates a lot of cognitive dissonance and simplifies the problem. In our case, the language is the functional subset of ANSI standard Common Lisp.

But the particular language is unimportant -- as long as it is seamlessly integrated into the reasoning engine so that no other language is ever encountered.

Code in that language is executed to perform the computations the user cares about.

This is an incredibly important point so I will say it again:

Code in that language is executed to perform the computations the user cares about.

One implication is that the execution environment must be efficient, rugged, and scalable -- capable of dealing with data of practical size and complexity. If the straightforward way of expressing an algorithm is too inefficient, the user must have recourse to other ways of expressing it and it must be possible to reason with the system to show that the efficient expression performs the desired task.

I personally regard it as a failure of our whole school of thought to have to escape from the reasoning system to do a computation efficiently enough. Such dilemmas have forced us, time and time again, to extend the language supported by the reasoning tool.

We have verified code that deals with gigabytes of data.

We have verified code that manipulates memory reads and writes at nearly C speeds.

The world's first silicon Java Virtual Machine was first modeled in ACL2. And the model executes at 90 percent of the speed of the C model.

Slide 13 - Rockwell-Collins / aJile Systems JEM1

Our models are compared to other engineering models by simulating them side-by-side. In these comparisons, the formal models are executed millions and millions of times.

Slide 14 - AMD Floating Point Design Process

Executability is crucial. And remember, I do not mean just the theoretical burden of reducing ground terms to constants. I mean the practical burden of doing it efficiently. I mean what I said before:

the language is used to perform the computations you care about.

The unified language approach also means that the properties that the user verifies are expressed in the language.

Here I have to clarify our rules a little -- and let me be clear that these are the rules I chose to follow, not ones I'm insisting you follow. There are some constructs in the language that cannot be executed. In our system some function symbols are merely constrained by axioms. These constrained symbols allow the user to express existential quantification and to specify systems without over-specifying them.

So every expression in the language is not actually an executable program. But every executable program is an expression in the language.

The unified language approach also means that to interact with the reasoning engine, the user writes expressions in the language. This means that hints, guidance, extensions, debugging information, etc., are all communicated via the language.

In our system, this is helped by the fact that the entire reasoning environment -- including the engine itself -- is written in the language. Thus, its internal data structures are just constants in the language and everything is visible and manageable by the user who speaks the language.

This has implications for the use of external tools. Ideally, they should be implemented in the same environment so that -- to the extent that they are open to the user -- their internal data structures can be understood.

I know this flies in the face of much common wisdom about modularity and software system design. But it helps illustrate, scientifically, an argument for why this is a grand challenge: we must find a system design methodology that permits a seamless, fully integrated system to be built by a diverse group of people.

This also highlights why we are the people to do this: if we are right that formal specification and proof provide a practical way to build reliable software, then we ought to be able to use our tools to build the tools we need.

Finally, it will be necessary from time to time to escape the system and descend to a lower level language -- or even to the machine architecture.

This can be done. It was done in the CLI stack about 20 years ago!

Slide 15 - The CLI Verified Stack (OS kernel and applications not shown)

Let me pause here. If you don't know about the verified stack of system components built by Computational Logic, Inc, in the late 1980s, you should. If your students don't know about it, they should. In my view, you cannot hope to deal with software verification unless you can deal with multiple layers of abstraction in a practical way. The CLI stack moved from a micro-architectural view of a fabricated microprocessor, through the ISA, an assembler, and two compilers, to high level applications and an operating system kernel -- all verified mechanically. I cannot tell you how many students I have met around the United States who are working on one piece of that problem in complete ignorance of what has already been done.

Returning to the issue of mapping to another programming language, the obvious approach works if your tools are good enough: formalize the semantics operationally in your system and prove theorems about constants representing programs in the other language.

The paper contains many many references to successful examples of this.

You may think that deep embedding is unnecessarily awkward and burdensome. I don't believe so. In the first place, the reasoning capability required for interpreter-style proofs is no different than any other. It just stresses the scale and performance -- but we're already committed to industrial-strength problems.

In the second place, it positions the project perfectly for the step of enlarging the scope of the reasoning engine to deal with other programming languages: all that is required is to formalize the semantics of those languages.

In a paper published last year in the Journal of the Spanish Academy of Science, Matt Kaufmann and I discussed the eight key research problems we see standing in the way of the grand challenge.

1. Automatic Invention of Lemmas and New Concepts

This is the problem. How do we invent proofs -- especially inductive proofs?

2. How to use Examples and Counterexamples

I believe this problem is part of the solution of the first one. I believe people frequently use examples to guide the search for a proof -- and more often use examples to guide the search for accurately stated theorems and inductive invariants.

3. How to use Analogy, Learning, and Data Mining

How do we learn new concepts? How do we come to understand their patterns of motion in terms? Proof by analogy is often a reasonable substitute for higher-order features. That is, instead of stating a higher-order general scheme, prove the instances automatically as needed. How do we learn what moves are likely to work in a given context and what moves won't?

4. How to Architect an Open Verification Environment

In any sufficiently automatic system, the system's strategy will occasionally thwart the proof strategy the user has in mind. It must be possible to over-ride virtually every automatic decision -- in a sound way. This is also the place where we will address the problem of plugging together different tools -- other provers, decision procedures, static analyzers, etc.

5. Parallel, Distributed and Collaborative Theorem Proving

Big proof projects involve many people working on interconnected parts of the problem. Big projects require ``proof maintenance'' where previously constructed proofs must be re-constructed for slightly different models and specs. How do we support distributed proof development and maintenance?

6. User Interface and Interactive Steering

How do we combine a fully automatic proof search engine with one that allows the user to supply proof sketches, hints, and strategic guidance? How do we inform the user of what the system is doing so the user can help -- how can we do that without drowning the user in a sea of formulas?

7. Education of the User Community -- and Their Managers

Even if we succeed in building an empowering verification environment, we have to motivate and educate people to use it.

8. How to Build a Verified Theorem Prover

At first sight, this goal may not seem to be on the critical path. However, it is difficult to sell a product you don't use -- and that includes formal methods.

Remember, this paper asks the questions. It doesn't answer them.

In summary, my main advice is simply to accept the grand challenge of formal methods itself.

Build only those things you can imagine specifying and verifying -- and if your group cannot handle its own system, then find some partners who can build the part you need.

Our community is perfectly positioned as the vanguard of the revolution. If we accept the challenge, we are likely to be the first group to build a machine that reasons.

All of you using mechanized deduction engines have experienced the thrill of having a machine tell you that you were wrong about something you firmly believed!

You may have experienced the elation of having a machine find a proof before you did.

You have felt the relief of knowing that the burden of soundness is on the machine.

You were the first in the world to know what it feels like to have a machine to reason with -- to educate, to explain to, to argue with, to be enlightened by.

If we accept the grand challenge we could bring that experience to millions of programmers and truly begin to revolutionize human thought.

Thank you.

A Mechanized Program Verifier

J Strother Moore Department of Computer Sciences University of Texas at Austin presented at The Verification Grand Challenge Workshop Zurich October 10--13, 2005

J Strother Moore
Department of Computer Sciences
University of Texas at Austin

presented at

The Verification Grand Challenge Workshop
Zurich
October 10--13, 2005