CS378 A Formal Model of the Java Virtual Machine

Spring, 2012

Unique Id: 53105
Meets: MW 3:30 - 5:00 PAI 3.14
Instructor: J Strother Moore
CV: here
Discussion Board: Piazza (Semester: Spring 2012; Class Name: CS 378 JVM)
Office: MAI 2014
Email: moore@cs.utexas.edu
Office Hours: MW 1:00 - 2:00 MAI 2014
Midterm Test: Mar 7, in class (last class day before Spring Break)
Last Test: May 2, in class (last class day of the semester)

Important Note: The tests are half of your grade and occur just before Spring Break and the end of semester. Do not miss the tests by leaving campus early!

Assignments and Supplemental Material

Jan 18: Read A Gentle Inroduction to ACL2 Programming before the Jan 23 class meeting and come to that meeting prepared to define some simple ACL2 functions.
Jan 23: Lecture 2 and definition of M1 package.

Jan 24: In Lecture 3 you will find the descriptions of eight M1 instructions. The first, ILOAD, leads us to write

(defun execute-ILOAD (inst s)
       (make-state (+ 1 (pc s))
                   (locals s)
                   (push (nth (arg1 inst)
                              (locals s))
                         (stack s))
                   (program s)))

Define the other seven instructions and come prepared on Monday to present them at the board.

Jan 30: We have defined seven of the eight instructions requested in Lecture 3. In particular, we defined the semantics of ILOAD, ICONST, IADD, ISUB, IMUL, ISTORE, and GOTO. Define the last instruction, IFEQ, and then answer the questions on pages 18 and 19 of the lecture. That will complete the definition of M1! I strongly urge you to build your own M1 and run it on some example programs.
Feb 1: Warren Hunt lectured today on how to prove
```
(implies (and (natp x) 
              (natp y))
         (equal (g x y 0)
                (* x y)))
```
where
```
(defun g (x y a)
  (if (zp x)
      a
      (g (- x 1) y (+ y a)))).
```
The proof required two key steps: (a) generalization from (g x y 0) to (g x y a), and (b) induction. Here is an ACL2 script and the session log. You should be able to use the ACL2 system well enough to replay the script!

Feb 6: We finished the definition of M1 and wrote and tested one simple M1 program, namely a program for computing g above. Specifically, we defined

(defconst *g-program*
       '((ICONST 0)   ; 0
         (ISTORE 2)   ; 1  a = 0;
; loop:
         (ILOAD 0)    ; 2
         (IFEQ 10)    ; 3  if x=0 then go to end;
         (ILOAD 0)    ; 4
         (ICONST 1)   ; 5
         (ISUB)       ; 6
         (ISTORE 0)   ; 7  x = x-1;
         (ILOAD 1)    ; 8
         (ILOAD 2)    ; 9
         (IADD)       ;10
         (ISTORE 2)   ;11  a = y+a;
         (GOTO -10)   ;12  go to loop
; end:
         (ILOAD 2)    ;13
         (HALT)       ;14 ``return'' a (actually, halt with a on stack).
         ))

and tested it on several inputs. This program may be described as taking natural number inputs x and y (in locals 0 and 1) and computing x*y by adding y to an initially 0 accumulator, a. The program ``returns'' the final value of a by leaving it on the stack and halting. Compare *g-program* the JVM bytecode produced for the method g in the Java file Demo.java below:

class Demo {
    public static int g(int x, int y){
      int a = 0;
      while (x!=0) {x = x-1; a = y+a;}
      return a;
  }

  public static void main(String[] args){
      int x = Integer.parseInt(args[0], 10);
      int y = Integer.parseInt(args[1], 10);
      System.out.println(g(x,y));
      return;
    }
}

by compiling Demo.java and inspecting the bytecode with

javap -c Demo

To test *g-program* (1) follow the instructions here to get the full definition of M1 available to your ACL2, (2) start your ACL2 and do:

(include-book "m1/m1") ; The first `m1/' indicates the directory where you installed m1.lisp.
(in-package "M1")      ; To get into the M1 package.

and finally (3), try something like:

(run (repeat 'tick 50)
     (make-state 0
                 (list 4 5)
                 nil
                 *g-program*))

to run the program on inputs 4 and 5. Inspect the final state and see if it meets your expectations. Try other inputs. I also recommend that you define and test other simple M1 programs, like factorial, sum the numbers below n, and perhaps one that indicates whether a natural number is even or odd by pushing 1 or 0 on the stack.

Feb 8: I went through the first 72 pages of lecture06.pdf. (However, I added pages 4-8 after the lecture!) I recommend that you review them and look ahead to what we'll do next. I gave two ACL2 demos; the scripts for the demos are here.
Feb 13: I continued presenting lecture06.pdf (from Feb 8, above). I have fixed two subscripting typos, as promised. We got to page 128 in the pdf file.
Feb 15: I finished lecture lecture06.pdf. (In the course of the lecture -- and without mentioning it to the class -- I noticed a typo in the proof of the inner-loop, starting on page 138. I've fixed it.) I have posted the script used to verify *g-program*. It is written as a 7 step template. I recommend that you (a) replay this template for *g-program* in your ACL2 and then modify it repeatedly to verify some other simple M1 programs
Feb 20: We worked an example M1 proof in class. I started with the template, above, and edited it so that it was a proof of the total correctness of a bit-flipping even/odd program. I also announced that there will be no class either Monday, Feb 27 or Wednesday, Feb 29. However, I urge you to do the challenge examples mentioned above! We will have class on Monday, March 5, and then on Wednesday, March 7, we will have the midterm. It would be best for you to spend March 5 asking me about lessons learned on the challenge examples.
Feb 22: I described the midterm exam (to be held in class on March 7) as having 5-6 questions concerning:
- extending M1 with new instructions
- formalizing certain ideas commonly used, e.g., ``the number of items on the stack of state s'' and ``the indices of the variables (possibly) written by the program in s.'' The latter example is answered perfectly by:
```
(all-written-locals (program s)), where

(defun all-written-locals (program)
  (cond ((endp program) nil)
        ((equal (op-code (car program)) 'ISTORE)
         (cons (arg1 (car program))
               (all-written-locals (cdr program))))
        (t (all-written-locals (cdr program)))))
```
- formalizing the schedule function for a simple M1 program, and
- formalizing the theorems needed to prove a simple algorithm correct (e.g., the helper-is-theta and fn-is-theta theorems) from the template.
I also noted that the midterm handout will include m1.lisp, m1-support.lisp, and template.lisp. The test will be open notes, but I urge you not to cut down a lot of trees! The handouts included above are almost certain to contain all the examples you will need.
We then looked at some instructions in the The Java Virtual Machine Specification and worked out how their analogues could be formalized in the context of M1, specifically IF<cond>, IF_ICMP<cond>, DUP, DUP_X1, SWAP, JSR, and RET.
Finally, I briefly talked about stack maps but we'll cover them in more detail in the future. They will not be on the exam, but the algorithm for computing them requires the kind of formalization skills previously mentioned.
Mar 5: I answered questions about the midterm and I worked the M1 Fibonacci example. The Fibonacci example is somewhat more complicated than any I'd expect you to do on the midterm but it has two important features: the loop has more than one IFEQ in it (so the schedule is more complicated) and there are two distinct exit points (so the specification of the final pc is more complicated). But the moral is simple: use IF to say what is true. Remember the midterm is this coming Wednesday, Mar 7. See the description of the midterm in the bullet above.
Mar 7: The Midterm Exam was taken today. The answers are here. In addition, the answers are elaborated in two proof files that demonstrate the correctness of the answers to problems (5) and (6), see m1/proofs/midterm-pi.lisp and m1/proofs/power.lisp.
Mar 19 and 21: I gave a lecture on The Turing Equivalence of M1, showing how I proved, with ACL2, that M1 can simulate a Turing machine. As noted in the lecture, the whole point of having formal machine models is to allow us to prove things about them and this lecture illustrates a particularly interesting property of M1, proved formally.

Mar 26: I introduced M5. M5 is a more complete model of a JVM-like machine. It supports multiple threads, a heap containg Objects that are instances of classes, a variety of method invocations, synchronization primitives, and exceptions.

By way of comparison, M1, which we have been studying until now, is less than 4 pages of ACL2 code. M5 is about 28 pages of ACL2 code. It will allow us to study the formalization of the features mentioned above while still being something you can master in the time remaining. Our most accurate model of the JVM, M6, is 160 pages of ACL2 code plus about 500 pages of built-in classes from the Java API.

I have posted M5 at m5/README.html. I urge you certify those books on your local version of ACL2, then:

(include-book "m5/m5")
(in-package "M5")
(defconst *s*
  '(STATE
    (:TT ((THREAD (:ID 0)
		  (:CS ((FRAME (:PC 0)
			       (:LOCS NIL)
			       (:STK (5))
			       (:MLOC ("Math" "main" 2)))))
		  (:STAT ACTIVE)
		  (:REF NIL))))
    (:HP NIL)
    (:CT ((CLASS (:NAME "Object")
		 (:SUPERS NIL)
		 (:FIELDS ("monitor" "mcount" "wait-set"))
		 (:METHODS NIL))
	  (CLASS (:NAME "Thread")
		 (:SUPERS ("Object"))
		 (:FIELDS NIL)
		 (:METHODS ((METHOD (:NAME "run")
				    (:FORMALS NIL)
				    (:SYNC NIL)
				    (:CODE ((RETURN)))
				    (:XTBL NIL))
			    (METHOD (:NAME "start")
				    (:FORMALS NIL)
				    (:SYNC NIL)
				    (:CODE NIL)
				    (:XTBL NIL))
			    (METHOD (:NAME "stop")
				    (:FORMALS NIL)
				    (:SYNC NIL)
				    (:CODE NIL)
				    (:XTBL NIL)))))
	  (CLASS (:NAME "Math")
		 (:SUPERS ("Object"))
		 (:FIELDS NIL)
		 (:METHODS ((METHOD (:NAME "fact")
				    (:FORMALS (N))
				    (:SYNC NIL)
				    (:CODE ((LOAD 0)
					    (IFEQ 8)
					    (LOAD 0)
					    (LOAD 0)
					    (CONST -1)
					    (ADD)
					    (INVOKESTATIC ("Math" "fact" 1))
					    (MUL)
					    (XRETURN)
					    (CONST 1)
					    (XRETURN)))
				    (:XTBL NIL))
			    (METHOD (:NAME "app")
				    (:FORMALS (X Y))
				    (:SYNC NIL)
				    (:CODE ((LOAD 0)
					    (LOAD 1)
					    (INVOKESTATIC ("Math" "fact" 1))
					    (ADD)
					    (XRETURN)))
				    (:XTBL NIL))
			    (METHOD (:NAME "main")
				    (:FORMALS NIL)
				    (:SYNC NIL)
				    (:CODE ((INVOKESTATIC ("Math" "fact" 1))
					    (INVOKESTATIC ("Math" "app" 2))))
				    (:XTBL NIL)))))))))

and then practice accessing pieces of this state by choosing a piece to target and then evaluating an expression that is supposed to return it. For example, the ``mloc of the top frame on the call stack of thread 0'' is computed by either

(mloc (top (cs (find :id 0 (tt *s*)))))

(get :mloc (top (get :cs (find :id 0 (tt *s*)))))

both of which return

("Math" "main" 2).

Make up examples of your own and test your thinking.

Apr 2: I gave a lecture on INVOKESTATIC and xRETURN. The example I used and the commands to ``snapshot'' certain intermediate states are here. I recommend that you think of a simple, recursive arithmetic function (e.g., Fibonnaci, sum of squares, greatest common divisor, etc.) and code it up as a recursive method in M5 and then snapshot its execution to cement your understanding of the basic call and return mechanism. Then in your local model of M5 you might change the notion of method resolution so that the method invoked is influenced by the number of arguments in the descriptor provided to INVOKESTATIC. That is, instead of finding a method of the given name in the super class chain of the given class, find the method of a given name and input arity in the super class chain. Then code an example class with two methods with the same name and demonstrate that your semantics resolves to the ``right'' method.
Apr 4: I gave a lecture on NEW, GETFIELD, and PUTFIELD. The example I used and the commands to snapshop certain intermediate states are here. However, I modified the script from that used in the lecture so that (a) I popped the extraneous 1 off the stack after demonstrating the behavior of GETFIELD and (b) I eliminated certain intermediate states in the snapshots, just showing the before and after shots around the relevant NEW, PUTFIELD, and GETFIELD instructions.
Apr 9: I gave a lecture on INVOKEVIRTUAL and INVOKESPECIAL. The M5 demonstration I gave is here. The comments in the demo mention two Java files, Point.java and ColoredPoint.java that, when compiled with javac, illustrate actual JVM bytecode similar to (but not identical to) the M5 demo. At the end of the lecture script is a script that answers the challenge raised last time: how do you build a circular Object of n linked Objects on M5?
Apr 11: I gave a lecture on Threads. The lecture was dominated by a demonstration of a pointless M5 program that starts two threads which endlessly increment a thread-local variable. That demonstration is obtained by stepping various threads starting in the state *r* of lecture16.lisp. Then I presented a puzzle in which two threads are endlessly competing for a single resource, a counter initially 1, where each thread reads the counter twice, adds the results, and stores it back to the global counter. The question is whether you can invent a schedule that will make this system, called *s* in lecture16.lisp, produce any given natural number n by defining (schedule n) appropriately. I urge you to think about this -- and not to search for the answer online! Furthermore, when you think about it, don't think about M5 state *s* but think about the problem in a more abstract setting. If you think you know how to solve it, formalize in ACL2 a simple model of this problem (from scratch, not involving M5) and test your solution there. It is almost always pointless to analyze actual code, even for a machine as simple as M5, before you've analyzed an appropriate abstraction. Indeed, harking back to our M1 code proofs: first you prove the algorithm correct, then you prove that the M1 code implements the algorithm. So for this example the lesson is: first you produce an abstract model of the puzzle and solve it, then you implement that solution in M5 bytecode.
Apr 16: I gave a lecture on the JVM synchronization primitives, MONITORENTER and MONITORENTER. The M5 demo I gave is lecture17.lisp. I also discussed the Apprentice Challenge , which involves proving that a certain trivial Java program achieves mutual exclusion but exposes a delicacy: one must not change the reference object of a thread once the thread has been started.
Apr 18: I gave a lecture on the JVM primitives for handling exceptions, namely THROW and the exception table associated with every method. The M5 demo I gave is lecture18.lisp.
Apr 23: I discussed how to prepare for the Final Test, which is Wednesday, May 2, in class. In particular, I asked and we answered a practice final. A sample state, the questions, and our answers are given here.
Apr 25: I discussed M6, the most accurate ACL2 model to date of the Java Virtual Machine.
Apr 30: I discussed the recently completed proof by Erik Toibazarov and me that an M3 bytecode implementation of the preprocessing for a version of the Boyer-Moore fast string searching algorithm is correct. Erik's honors thesis is here.
May 2: Final test. The supporting material handed out for the final test included a sample M5 state, and the definition of M5 as found in m5/m5.lisp. Here are my answers, although yours may differ and still be correct. The grade distribution is here.
The End

Summary

We will study a formal specification of the Java Virtual Machine (JVM). The JVM is a stack-based, object-oriented, type-safe bytecode (assembly language) interpreter on which compiled Java programs are executed.

But the focus of the course will be on teaching you how to formalize a comparably complicated computing artifact and how to subsequently use that formalization. That is, we will be more interested in formalization techniques than in the JVM specifically.

You will learn about four different things: how to make a mathematical model of a complicated digital artifact like the JVM, how to program in a simple functional language, how to reason about such models, and how to use a powerful automatic reasoning tool.

Textbooks

The Java Virtual Machine Specification. Tim Lindholm, Frank Yellin Addison-Wesley ISBN 0-201-63452-X (or latest edition). Sun (now Oracle) distributes this for free.
Computer-Aided Reasoning: An Approach. Matt Kaufmann, Panagiotis Manolios, J Moore. This will also be available for sale at cost ($20) from the instructor during the first week of class. (This textbook is optional. I think you can get along without it. Just ask me questions and learn to use the online documentation!)

Other Useful Resources

We will use the ACL2 theorem prover. If you want to run it on a UTCS Linux machine type /p/bin/acl2. Most users install it on their laptops (see below). And most users either run it in an Emacs shell buffer or via an Eclipse (``ACL2 Sedan'') interface (see below).
The ACL2 Home Page, which includes sources, installation instructions, and user documentation for the programming language and the theorem prover
Eclipse ACL2 Sedan installation instructions
Eclipse ACL2 Sedan Installation FAQ
How to Use the ACL2 Sedan
A Gentle Inroduction to ACL2 Programming
ACL2 Programming Exercises
Introduction to the Theorem Prover

Grades

30% — attendance in class
25% — class participation (asking questions, answering questions, presenting solutions at the board) and office visits
15% — Midterm Test, Wednesday, March 7, 2012
35% — Last Test, Wednesday, May 2, 2012

You will note that 105% is accounted for above. The extra 5% may be considered “slack points” so that, for example, you may miss several classes and still make a perfect 100%.

Extra Credit: Extra credit will be given for projects presented at the end of the semester. Possibilities for projects will be discussed from time to time in class. If you have a project proposal, discuss it with me before you invest time in it. You may work with others on projects.

Pre-Requisites

Upper-division standing is required for all CS378 classes.

If the question is “What do I have to know in order to do well in this course?” as opposed to “What are the university rules?” the answer is: mathematical logic, including induction, and some experience programming in some language, preferably Java. You should be able to use Eclipse or Emacs. Experience with Lisp or ACL2 is helpful but the subset we use is relatively small and will be taught (quickly).

Tools

I will teach you how to define and run programs in the ACL2 programming language and to use the ACL2 theorem prover. You may use either the Eclipse or Emacs interface to ACL2. Both are available on the CS Department's public machines. You may also wish to install ACL2, Emacs, and/or Eclipse on your own machine.

See How to Use ACL2s to get started.

Lecture and Discussion Schedule

We'll approach the JVM model incrementally, starting with a very simple (suggestive but inaccurate) model. Then we will extend and revise it repeatedly toward a more accurate description of the JVM. We'll learn the necessary functional programming and proof techniques by building the simplest model. Most of the semester will be spent extending and exploring more elaborate models.

You will be expected to do much of the formalization work here and extra-credit project ideas may come out our discussions. For example, good projects might include the formalization or elaboration of features not dicussed in class or the mechanized proofs of some of the properties discussed.

We will adhere pretty closely to the following sequence of topics. But since many classes will be presentations by students in answer to questions raised by the instructor, the pace may vary somewhat.

All dates below speculative.

Wed, Jan 18 Introduction

Mon, Jan 23 building M1 -- functional programming in ACL2

Wed, Jan 25 building M1 -- functional programming in ACL2

Mon, Jan 30 building M1 -- functional programming in ACL2

Wed, Feb   1 reasoning about M1 “by hand”

Mon, Feb   6 reasoning about M1 “by hand”

Wed, Feb   8 quick introduction to how ACL2's prover works

Mon, Feb 13 mechanized proofs about M1

Wed, Feb 15 mechanized proofs about M1

Mon, Feb 20 mechanized proofs about M1

Wed, Feb 22 mechanized proofs about M1

Mon, Feb 27 the class table, the heap, and threads

Wed, Feb 29 macros for managing an elaborate state

Mon, Mar   5 M5 — a fairly realistic JVM model

Wed, Mar   7 Midterm Test

Mon, Mar 12 Spring Break

Wed, Mar 14 Spring Break

Mon, Mar 19 object creation and manipulation

Wed, Mar 21 method resolution and invocation

Mon, Mar 26 threads and monitors

Wed, Mar 28 M5

Mon, Apr   2 mechanized proofs about M5

Wed, Apr   4 mechanized proofs about M5

Mon, Apr   9 mechanized proofs about M5

Wed, Apr 11 mechanized proofs about M5

Mon, Apr 16 extending M5

Wed, Apr 18 extending M5

Mon, Apr 23 extending M5

Wed, Apr 25 M6—an accurate JVM model

Mon, Apr 30 M6—an accurate JVM model

Wed, May   2 Last Test

Other Administrative Matters

Religious Holy Days: A student who is absent from an examination or cannot meet an assignment deadline due to the observance of a religious holy day may take the examination on an alternate day, submit the assignment up to 24 hours late without penalty, or be excused from the examination or assignment, if proper notice of the planned absence has been given. Notice must be given at least fourteen days prior to the classes scheduled on dates the student will be absent. For religious holy days that fall within the first two weeks of the semester, notice should be given on the first day of the semester. It must be personally delivered to the instructor and signed and dated by the instructor, or sent via certified mail, return receipt requested. Email notification will be accepted if received, but a student submitting such notification must receive email confirmation from the instructor. A student who fails to complete missed work within the time allowed will be subject to the normal academic penalties.

Disability Related Needs: Please notify me of any modification/adaptation you may require to accommodate a disability-related need. You will be requested to provide documentation to the Office of the Dean of Students in order that the most appropriate accommodations can be determined. Specialized services are available on campus through Services for Students with Disabilities, SSB 4th floor, A5800, 471-6259, TTY 471-4641

Emergencies and Illness: Documented emergencies and illnesses will be dealt with by the instructor. For best results, communicate with me before you miss a midterm or the final and be prepared to supply written, verifiable evidence of the condition.

Code of Conduct: For important other advice about expectations and conduct, see The Computer Sciences Department Rules to Live By.

Wed, Jan 18	Introduction
Mon, Jan 23	building M1 -- functional programming in ACL2
Wed, Jan 25	building M1 -- functional programming in ACL2
Mon, Jan 30	building M1 -- functional programming in ACL2
Wed, Feb 1	reasoning about M1 “by hand”
Mon, Feb 6	reasoning about M1 “by hand”
Wed, Feb 8	quick introduction to how ACL2's prover works
Mon, Feb 13	mechanized proofs about M1
Wed, Feb 15	mechanized proofs about M1
Mon, Feb 20	mechanized proofs about M1
Wed, Feb 22	mechanized proofs about M1
Mon, Feb 27	the class table, the heap, and threads
Wed, Feb 29	macros for managing an elaborate state
Mon, Mar 5	M5 — a fairly realistic JVM model
Wed, Mar 7	Midterm Test
Mon, Mar 12	Spring Break
Wed, Mar 14	Spring Break
Mon, Mar 19	object creation and manipulation
Wed, Mar 21	method resolution and invocation
Mon, Mar 26	threads and monitors
Wed, Mar 28	M5
Mon, Apr 2	mechanized proofs about M5
Wed, Apr 4	mechanized proofs about M5
Mon, Apr 9	mechanized proofs about M5
Wed, Apr 11	mechanized proofs about M5
Mon, Apr 16	extending M5
Wed, Apr 18	extending M5
Mon, Apr 23	extending M5
Wed, Apr 25	M6—an accurate JVM model
Mon, Apr 30	M6—an accurate JVM model
Wed, May 2	Last Test