- Unique Id: 53105
- Meets: MW 3:30 - 5:00 PAI 3.14
- Instructor: J Strother Moore
- CV: here
- Discussion Board: Piazza (Semester: Spring 2012; Class Name: CS 378 JVM)
- Office: MAI 2014
- Email: moore@cs.utexas.edu
- Office Hours: MW 1:00 - 2:00 MAI 2014
- Midterm Test: Mar 7, in class (last class day before Spring Break)
- Last Test: May 2, in class (last class day of the semester)

- Jan 18: Read A Gentle Inroduction to ACL2 Programming before the Jan 23 class meeting and come to that meeting prepared to define some simple ACL2 functions.
- Jan 23: Lecture 2 and definition of
`M1`

package. - Jan 24: In Lecture 3 you will find the descriptions of eight
M1 instructions. The first,
`ILOAD`

, leads us to write(defun execute-ILOAD (inst s) (make-state (+ 1 (pc s)) (locals s) (push (nth (arg1 inst) (locals s)) (stack s)) (program s)))

Define the other seven instructions and come prepared on Monday to present them at the board. - Jan 30: We have defined seven of the eight instructions requested in
Lecture 3. In particular, we defined the semantics of
`ILOAD`

,`ICONST`

,`IADD`

,`ISUB`

,`IMUL`

,`ISTORE`

, and`GOTO`

. Define the last instruction,`IFEQ`

, and then answer the questions on pages 18 and 19 of the lecture. That will complete the definition of M1! I strongly urge you to build your own M1 and run it on some example programs. - Feb 1: Warren Hunt lectured today on how to prove
(implies (and (natp x) (natp y)) (equal (g x y 0) (* x y)))

where(defun g (x y a) (if (zp x) a (g (- x 1) y (+ y a)))).

The proof required two key steps: (a) generalization from`(g x y 0)`

to`(g x y a)`

, and (b) induction. Here is an ACL2 script and the session log. You should be able to use the ACL2 system well enough to replay the script! - Feb 6: We finished the definition of M1 and wrote and tested one simple M1 program, namely
a program for computing
`g`

above. Specifically, we defined(defconst *g-program* '((ICONST 0) ; 0 (ISTORE 2) ; 1 a = 0; ; loop: (ILOAD 0) ; 2 (IFEQ 10) ; 3 if x=0 then go to end; (ILOAD 0) ; 4 (ICONST 1) ; 5 (ISUB) ; 6 (ISTORE 0) ; 7 x = x-1; (ILOAD 1) ; 8 (ILOAD 2) ; 9 (IADD) ;10 (ISTORE 2) ;11 a = y+a; (GOTO -10) ;12 go to loop ; end: (ILOAD 2) ;13 (HALT) ;14 ``return'' a (actually, halt with a on stack). ))

and tested it on several inputs. This program may be described as taking natural number inputs`x`

and`y`

(in locals 0 and 1) and computing`x*y`

by adding`y`

to an initially 0 accumulator,`a`

. The program ``returns'' the final value of`a`

by leaving it on the stack and halting. Compare`*g-program*`

the JVM bytecode produced for the method`g`

in the Java file`Demo.java`

below:class Demo { public static int g(int x, int y){ int a = 0; while (x!=0) {x = x-1; a = y+a;} return a; } public static void main(String[] args){ int x = Integer.parseInt(args[0], 10); int y = Integer.parseInt(args[1], 10); System.out.println(g(x,y)); return; } }

by compiling`Demo.java`

and inspecting the bytecode withjavap -c Demo

To test`*g-program*`

(1) follow the instructions here to get the full definition of M1 available to your ACL2, (2) start your ACL2 and do:(include-book "m1/m1") ; The first `m1/' indicates the directory where you installed m1.lisp. (in-package "M1") ; To get into the M1 package.

and finally (3), try something like:(run (repeat 'tick 50) (make-state 0 (list 4 5) nil *g-program*))

to run the program on inputs 4 and 5. Inspect the final state and see if it meets your expectations. Try other inputs. I also recommend that you define and test other simple M1 programs, like factorial, sum the numbers below n, and perhaps one that indicates whether a natural number is even or odd by pushing 1 or 0 on the stack. - Feb 8: I went through the first 72 pages of lecture06.pdf. (However, I added pages 4-8 after the lecture!) I recommend that you review them and look ahead to what we'll do next. I gave two ACL2 demos; the scripts for the demos are here.
- Feb 13: I continued presenting lecture06.pdf (from Feb 8, above). I have fixed two subscripting typos, as promised. We got to page 128 in the pdf file.
- Feb 15: I finished lecture lecture06.pdf. (In the course of the lecture -- and without mentioning it to the class --
I noticed a typo in the proof of the inner-loop, starting on page 138. I've fixed it.) I have posted the script
used to verify
`*g-program*`

. It is written as a 7 step template. I recommend that you (a) replay this template for`*g-program*`

in your ACL2 and then modify it repeatedly to verify some other simple M1 programs - Feb 20: We worked an example M1 proof in
class. I started with the template, above, and edited it so that it was a
proof of the total correctness of a bit-flipping even/odd program. I also
announced that
*there will be no class either Monday, Feb 27 or Wednesday, Feb 29*. However, I urge you to do the challenge examples mentioned above! We will have class on Monday, March 5, and then on Wednesday, March 7, we will have the midterm. It would be best for you to spend March 5 asking me about lessons learned on the challenge examples. - Feb 22: I described the midterm exam (to be held in class on March 7) as
having 5-6 questions concerning:
- extending M1 with new instructions
- formalizing certain ideas commonly used, e.g., ``the number of items on
the stack of state
`s`

'' and ``the indices of the variables (possibly) written by the program in`s`

.'' The latter example is answered perfectly by:(all-written-locals (program s)), where (defun all-written-locals (program) (cond ((endp program) nil) ((equal (op-code (car program)) 'ISTORE) (cons (arg1 (car program)) (all-written-locals (cdr program)))) (t (all-written-locals (cdr program)))))

- formalizing the schedule function for a simple M1 program, and
- formalizing the theorems needed to prove a simple algorithm
correct (e.g., the
`helper-is-theta`

and`fn-is-theta`

theorems) from the template.

I also noted that the midterm handout will include

`m1.lisp`

,`m1-support.lisp`

, and`template.lisp`

. The test will be open notes, but I urge you not to cut down a lot of trees! The handouts included above are almost certain to contain all the examples you will need.We then looked at some instructions in the The Java Virtual Machine Specification and worked out how their analogues could be formalized in the context of M1, specifically

`IF<cond>`

,`IF_ICMP<cond>`

,`DUP`

,`DUP_X1`

,`SWAP`

,`JSR`

, and`RET`

.Finally, I briefly talked about stack maps but we'll cover them in more detail in the future. They will not be on the exam, but the algorithm for computing them requires the kind of formalization skills previously mentioned.

- Mar 5: I answered questions about the midterm and I worked the M1 Fibonacci example. The Fibonacci example is somewhat more complicated than any I'd expect you to do on the midterm but it has two
important features: the loop has more than one
`IFEQ`

in it (so the schedule is more complicated) and there are two distinct exit points (so the specification of the final pc is more complicated). But the moral is simple: use`IF`

to say what is true. Remember the midterm is this coming Wednesday, Mar 7. See the description of the midterm in the bullet above. - Mar 7: The Midterm Exam was taken today. The answers are here. In addition, the answers are elaborated in two proof files that demonstrate the correctness of
the answers to problems (5) and (6), see m1/proofs/midterm-pi.lisp and
m1/proofs/power.lisp.
- Mar 19 and 21: I gave a lecture on The Turing Equivalence of M1, showing how I proved, with ACL2, that M1 can
simulate a Turing machine. As noted in the lecture, the whole point of having formal machine models is to allow us to prove things about them and this
lecture illustrates a particularly interesting property of M1, proved formally.
- Mar 26: I introduced M5. M5 is a more complete model of a JVM-like
machine. It supports multiple threads, a heap containg Objects that are
instances of classes, a variety of method invocations, synchronization
primitives, and exceptions.
By way of comparison, M1, which we have been studying until now, is less than 4 pages of ACL2 code. M5 is about 28 pages of ACL2 code. It will allow us to study the formalization of the features mentioned above while still being something you can master in the time remaining. Our most accurate model of the JVM, M6, is 160 pages of ACL2 code plus about 500 pages of built-in classes from the Java API.

I have posted M5 at m5/README.html. I urge you certify those books on your local version of ACL2, then:

(include-book "m5/m5") (in-package "M5") (defconst *s* '(STATE (:TT ((THREAD (:ID 0) (:CS ((FRAME (:PC 0) (:LOCS NIL) (:STK (5)) (:MLOC ("Math" "main" 2))))) (:STAT ACTIVE) (:REF NIL)))) (:HP NIL) (:CT ((CLASS (:NAME "Object") (:SUPERS NIL) (:FIELDS ("monitor" "mcount" "wait-set")) (:METHODS NIL)) (CLASS (:NAME "Thread") (:SUPERS ("Object")) (:FIELDS NIL) (:METHODS ((METHOD (:NAME "run") (:FORMALS NIL) (:SYNC NIL) (:CODE ((RETURN))) (:XTBL NIL)) (METHOD (:NAME "start") (:FORMALS NIL) (:SYNC NIL) (:CODE NIL) (:XTBL NIL)) (METHOD (:NAME "stop") (:FORMALS NIL) (:SYNC NIL) (:CODE NIL) (:XTBL NIL))))) (CLASS (:NAME "Math") (:SUPERS ("Object")) (:FIELDS NIL) (:METHODS ((METHOD (:NAME "fact") (:FORMALS (N)) (:SYNC NIL) (:CODE ((LOAD 0) (IFEQ 8) (LOAD 0) (LOAD 0) (CONST -1) (ADD) (INVOKESTATIC ("Math" "fact" 1)) (MUL) (XRETURN) (CONST 1) (XRETURN))) (:XTBL NIL)) (METHOD (:NAME "app") (:FORMALS (X Y)) (:SYNC NIL) (:CODE ((LOAD 0) (LOAD 1) (INVOKESTATIC ("Math" "fact" 1)) (ADD) (XRETURN))) (:XTBL NIL)) (METHOD (:NAME "main") (:FORMALS NIL) (:SYNC NIL) (:CODE ((INVOKESTATIC ("Math" "fact" 1)) (INVOKESTATIC ("Math" "app" 2)))) (:XTBL NIL)))))))))

and then practice accessing pieces of this state by choosing a piece to target and then evaluating an expression that is supposed to return it. For example, the ``mloc of the top frame on the call stack of thread 0'' is computed by either(mloc (top (cs (find :id 0 (tt *s*)))))

or(get :mloc (top (get :cs (find :id 0 (tt *s*)))))

both of which return("Math" "main" 2).

Make up examples of your own and test your thinking. - Apr 2: I gave a lecture on
`INVOKESTATIC`

and`xRETURN`

. The example I used and the commands to ``snapshot'' certain intermediate states are here. I recommend that you think of a simple, recursive arithmetic function (e.g., Fibonnaci, sum of squares, greatest common divisor, etc.) and code it up as a*recursive*method in M5 and then snapshot its execution to cement your understanding of the basic call and return mechanism. Then in your local model of M5 you might change the notion of method resolution so that the method invoked is influenced by the number of arguments in the descriptor provided to`INVOKESTATIC`

. That is, instead of finding a method of the given name in the super class chain of the given class, find the method of a given name and input arity in the super class chain. Then code an example class with two methods with the same name and demonstrate that your semantics resolves to the ``right'' method. - Apr 4: I gave a lecture on
`NEW`

,`GETFIELD`

, and`PUTFIELD`

. The example I used and the commands to snapshop certain intermediate states are here. However, I modified the script from that used in the lecture so that (a) I popped the extraneous 1 off the stack after demonstrating the behavior of`GETFIELD`

and (b) I eliminated certain intermediate states in the snapshots, just showing the before and after shots around the relevant`NEW`

,`PUTFIELD`

, and`GETFIELD`

instructions. - Apr 9: I gave a lecture on
`INVOKEVIRTUAL`

and`INVOKESPECIAL`

. The M5 demonstration I gave is here. The comments in the demo mention two Java files, Point.java and ColoredPoint.java that, when compiled with javac, illustrate actual JVM bytecode similar to (but not identical to) the M5 demo. At the end of the lecture script is a script that answers the challenge raised last time: how do you build a circular Object of*n*linked Objects on M5? - Apr 11: I gave a lecture on Threads. The lecture was dominated by a demonstration of
a pointless M5 program that starts two threads which endlessly increment a thread-local
variable. That demonstration is obtained by stepping various threads starting in the state
`*r*`

of lecture16.lisp. Then I presented a puzzle in which two threads are endlessly competing for a single resource, a counter initially 1, where each thread reads the counter twice, adds the results, and stores it back to the global counter. The question is whether you can invent a schedule that will make this system, called`*s*`

in lecture16.lisp, produce any given natural number`n`

by defining`(schedule n)`

appropriately. I urge you to think about this -- and not to search for the answer online! Furthermore, when you think about it, don't think about M5 state`*s*`

but think about the problem in a more abstract setting. If you think you know how to solve it, formalize in ACL2 a simple model of this problem (from scratch, not involving M5) and test your solution there. It is almost always pointless to analyze actual code, even for a machine as simple as M5, before you've analyzed an appropriate abstraction. Indeed, harking back to our M1 code proofs: first you prove the algorithm correct, then you prove that the M1 code implements the algorithm. So for this example the lesson is: first you produce an abstract model of the puzzle and solve it, then you implement that solution in M5 bytecode. - Apr 16: I gave a lecture on the JVM synchronization
primitives,
`MONITORENTER`

and`MONITORENTER`

. The M5 demo I gave is lecture17.lisp. I also discussed the Apprentice Challenge, which involves proving that a certain trivial Java program achieves mutual exclusion but exposes a delicacy: one must not change the reference object of a thread once the thread has been started. - Apr 18: I gave a lecture on the JVM primitives for handling exceptions, namely
`THROW`

and the exception table associated with every method. The M5 demo I gave is lecture18.lisp. - Apr 23: I discussed how to prepare for the Final Test, which is Wednesday, May 2,
in class. In particular, I asked and we answered a practice final. A sample state,
the questions, and our answers are given here.
- Apr 25: I discussed M6, the most accurate ACL2 model to date of the Java
Virtual Machine.
- Apr 30: I discussed the recently completed proof by Erik Toibazarov and me that
an M3 bytecode implementation of the preprocessing for a version of the Boyer-Moore
fast string searching algorithm is correct. Erik's honors thesis is
here.
- May 2: Final test. The supporting material handed out for the
final test included a sample M5 state, and the definition
of M5 as found in m5/m5.lisp. Here are my answers,
although yours may differ and still be correct. The grade distribution is here.
- The End

We will study a formal specification of the Java Virtual Machine (JVM). The JVM is a stack-based, object-oriented, type-safe bytecode (assembly language) interpreter on which compiled Java programs are executed.

But the focus of the course will be on teaching you how to formalize a comparably complicated computing artifact and how to subsequently use that formalization. That is, we will be more interested in formalization techniques than in the JVM specifically.

You will learn about four different things: how to make a mathematical model of a complicated digital artifact like the JVM, how to program in a simple functional language, how to reason about such models, and how to use a powerful automatic reasoning tool.

- The Java Virtual Machine Specification.
Tim Lindholm, Frank Yellin
Addison-Wesley
ISBN 0-201-63452-X (or latest edition). Sun (now Oracle) distributes this
*for free*. - Computer-Aided Reasoning: An Approach. Matt Kaufmann, Panagiotis Manolios, J Moore. This will also be available for sale at cost ($20) from the instructor during the first week of class. (This textbook is optional. I think you can get along without it. Just ask me questions and learn to use the online documentation!)

- We will use the ACL2 theorem prover. If you want to run it on a UTCS
Linux machine type
`/p/bin/acl2`

. Most users install it on their laptops (see below). And most users either run it in an Emacs shell buffer or via an Eclipse (``ACL2 Sedan'') interface (see below). - The ACL2 Home Page, which includes sources, installation instructions, and user documentation for the programming language and the theorem prover
- Eclipse ACL2 Sedan installation instructions
- Eclipse ACL2 Sedan Installation FAQ
- How to Use the ACL2 Sedan
- A Gentle Inroduction to ACL2 Programming
- ACL2 Programming Exercises
- Introduction to the Theorem Prover

- 30% — attendance in class
- 25% — class participation (asking questions, answering questions, presenting solutions at the board) and office visits
- 15% — Midterm Test, Wednesday, March 7, 2012
- 35% — Last Test, Wednesday, May 2, 2012

*Extra Credit*: Extra credit will be given for projects presented at the end of the
semester. Possibilities for projects will be discussed
from time to time in class. If you have a project proposal, discuss it with me *before* you invest
time in it. You may work with others on projects.

Upper-division standing is required for all CS378 classes.

If the question is “What do I have to *know* in order to do well
in this course?” as opposed to “What are the university
rules?” the answer is: mathematical logic, including induction, and
some experience programming in some language, preferably Java. You should be
able to use Eclipse or Emacs. Experience with Lisp or ACL2 is helpful but
the subset we use is relatively small and will be taught (quickly).

See How to Use ACL2s to get started.

We'll approach the JVM model incrementally, starting with a very simple (suggestive but inaccurate) model. Then we will extend and revise it repeatedly toward a more accurate description of the JVM. We'll learn the necessary functional programming and proof techniques by building the simplest model. Most of the semester will be spent extending and exploring more elaborate models.

You will be expected to do much of the formalization work here and extra-credit project ideas may come out our discussions. For example, good projects might include the formalization or elaboration of features not dicussed in class or the mechanized proofs of some of the properties discussed.

We will adhere pretty closely to the following sequence of topics. But since many classes will be presentations by students in answer to questions raised by the instructor, the pace may vary somewhat.

**All dates below speculative.**

Wed, Jan 18 | Introduction |

Mon, Jan 23 | building M1 -- functional programming in ACL2 |

Wed, Jan 25 | building M1 -- functional programming in ACL2 |

Mon, Jan 30 | building M1 -- functional programming in ACL2 |

Wed, Feb 1 | reasoning about M1 “by hand” |

Mon, Feb 6 | reasoning about M1 “by hand” |

Wed, Feb 8 | quick introduction to how ACL2's prover works |

Mon, Feb 13 | mechanized proofs about M1 |

Wed, Feb 15 | mechanized proofs about M1 |

Mon, Feb 20 | mechanized proofs about M1 |

Wed, Feb 22 | mechanized proofs about M1 |

Mon, Feb 27 | the class table, the heap, and threads |

Wed, Feb 29 | macros for managing an elaborate state |

Mon, Mar 5 | M5 — a fairly realistic JVM model |

Wed, Mar 7 | Midterm Test |

Mon, Mar 12 | Spring Break |

Wed, Mar 14 | Spring Break |

Mon, Mar 19 | object creation and manipulation |

Wed, Mar 21 | method resolution and invocation |

Mon, Mar 26 | threads and monitors |

Wed, Mar 28 | M5 |

Mon, Apr 2 | mechanized proofs about M5 |

Wed, Apr 4 | mechanized proofs about M5 |

Mon, Apr 9 | mechanized proofs about M5 |

Wed, Apr 11 | mechanized proofs about M5 |

Mon, Apr 16 | extending M5 |

Wed, Apr 18 | extending M5 |

Mon, Apr 23 | extending M5 |

Wed, Apr 25 | M6—an accurate JVM model |

Mon, Apr 30 | M6—an accurate JVM model |

Wed, May 2 | Last Test |

A *mathematical logic* is a formal system consisting of a precisely
defined *syntax*, some *axioms*, and some *rules of inference*. The axioms are
just formulas in the syntax — formulas that are taken to be ``always true.''
The rules of inference are formula transformers that preserve truth. A
*theorem* is a formula that can be derived from the axioms by applying
the rules of inference. A theorem is thus ``always true.'' By modeling
a computing system in a mathematical logic we can prove theorems about it to
establish its properties.

You studied formal mathematical logic in CS313K and in CS336. There you learned propositional calculus as a formal system. You also learned first order predicate calculus. You might have also learned set theory. So which mathematical logic do we use to describe the Java Virtual Machine?

The mathematical logic we use is a *functional programming language*,
Pure Lisp. If you know anything at all about Lisp, you probably think of it
as merely a programming language. But we cast it as a logic, with a
precisely given syntax, some axioms, and some rules of inference. We will
prove theorems in Lisp.

Put another way, in this course you will come to understand the JVM by studying a model of the Java Virtual Machine written in a funtional programming language.

We will cover representatives of most of the JVM byte codes, including
`IADD`

, `ILOAD`

, `ISTORE`

,
`IFGT`

, `GOTO`

, `NEW`

,
`PUTFIELD`

, `INVOKEVIRTUAL`

, and
`MONITORENTER`

. We will not cover the entire JVM — for example,
we will not deal with the details of arithmetic, arrays, class loading, or
native methods. However, by the end of this course you will be able to write
formal specifications of many of the omitted parts.

We will discuss the Java bytecode verifier; in particular, we will investigate its specification: what properties should it have?

The logic we use is supported by a mechanical theorem prover, ACL2. This theorem prover is in use in industry to verify properties of hardware, microcode, and software. In fact, its authors won the 2005 ACM Software System Award for the lasting influence their theorem provers have had on computer science.

This course is an unusual mixture of many CS courses. It is like CS307 in that we will be dealing with the Java programming language. It is like CS310 in that we will be looking at an assembly level language. It is like CS352 in that we will be considering the architectural features of the processor. It is like CS372 in that we will be considering process management, memory management, protection, thread scheduling, and concurrency. It is like CS313K in that we will be dealing with a formal logic. It is like CS336 in that we will be formally modeling and proving theorems about our programs and algorithms. It is like parts of CS343 in that we will be discussing mechanized reasoning.

*Religious Holy Days*: A student who is absent from an examination or
cannot meet an assignment deadline due to the observance of a religious holy
day may take the examination on an alternate day, submit the assignment up to
24 hours late without penalty, or be excused from the examination or
assignment, if proper notice of the planned absence has been given. Notice
must be given at least fourteen days prior to the classes scheduled on dates
the student will be absent. For religious holy days that fall within the
first two weeks of the semester, notice should be given on the first day of
the semester. It must be personally delivered to the instructor and signed
and dated by the instructor, or sent via certified mail, return receipt
requested. Email notification will be accepted if received, but a student
submitting such notification must receive email confirmation from the
instructor. A student who fails to complete missed work within the time
allowed will be subject to the normal academic penalties.

*Disability Related Needs*: Please notify me of any
modification/adaptation you may require to accommodate a disability-related
need. You will be requested to provide documentation to the Office of the
Dean of Students in order that the most appropriate accommodations can be
determined. Specialized services are available on campus through Services for
Students with Disabilities, SSB 4th floor, A5800, 471-6259, TTY 471-4641

*Emergencies and Illness*: Documented emergencies and illnesses will be
dealt with by the instructor. For best results, communicate with me
*before* you miss a midterm or the final and be prepared to supply
written, verifiable evidence of the condition.