A tutorial on how to use the parallelism library.
This documentation topic relates to the experimental extension of ACL2 supporting parallel execution and proof; see parallelism.
In this topic we introduce the ACL2 parallelism primitives using the example of a doubly-recursive Fibonacci function, whose basic definition is as follows. See parallelism for a very high-level summary of the parallelism capability described here, and see compiling-ACL2p for how to build an executable image that supports parallel execution. Here, we assume that such an executable is being used.
(defun fib (x) (declare (xargs :guard (natp x))) (cond ((or (zp x) (<= x 0)) 0) ((= x 1) 1) (t (+ (fib (- x 1)) (fib (- x 2))))))
A simple way to introduce parallelism into this function definition is to wrap the addition expression with a call of pargs, and the arguments to the addition will be computed in parallel whenever resources are available. As a result, we end up with a very similar and thus intuitive function definition. Note that we replaced + by binary-+, since pargs expects a function call, not a macro call.
(defun pfib (x) (declare (xargs :guard (natp x))) (cond ((or (zp x) (<= x 0)) 0) ((= x 1) 1) (t (pargs (binary-+ (pfib (- x 1)) (pfib (- x 2)))))))
Introducing the Granularity Problem
After you submit the above two versions of the Fibonacci function, test them with the following forms.
(time$ (fib 10)) (time$ (pfib 10))
Now increase the argument by increments of 5 to 10 until you find your curiosity satisfied or your patience wearing thin. You can interrupt evaluation if necessary and return to the ACL2 loop. You will immediately notice that you have not increased execution speed, at least not by much, by introducing parallelism.
First, consider the computation of
Second, we must realize that if we have two threads available for computing
In summary: We want to tell ACL2 that it can evaluate the arguments of
pargs in parallel only when the parameter of
Explicit Programming for the Granularity Problem
One way to avoid the granularity problem is to duplicate code as follows.
(defun pfib (x) (declare (xargs :guard (natp x))) (cond ((or (zp x) (<= x 0)) 0) ((= x 1) 1) (t (if (> x 27) ; the granularity test (pargs (binary-+ (pfib (- x 1)) (pfib (- x 2)))) (binary-+ (pfib (- x 1)) (pfib (- x 2)))))))
Duplicating code is fundamentally a bad design principle, because it can double the work for future maintenance. A ``granularity form'' is an expression
(declare (granularity <form>))
that can allow threads to be spawned (without duplicating code) whenever
the evaluation of
(defun pfib (x) (declare (xargs :guard (natp x))) (cond ((or (zp x) (<= x 0)) 0) ((= x 1) 1) (t (pargs (declare (granularity (> x 27))) (binary-+ (pfib (- x 1)) (pfib (- x 2)))))))
Test each version as follows (or substitute your own natural number for 33).
(time$ (fib 33)) (time$ (pfib 33))
Another Granularity Issue Related to Thread Limitations
Our implementation of parallelism primitives has the property that once a thread is assigned a computation, that assignment stays in effect until the computation is complete. In particular, if a thread encounters a parallelism primitive that spawns child threads, the parent thread stays assigned, waiting until the child computations complete before it can continue its own computation. In the meantime, the parent thread reduces the number of additional threads that Lisp can provide by 1, rather than being reassigned to do other work.
How can this lack of reassignment affect the user? Consider, for example,
the application of a recursive function to a long list. Imagine that this
function is written so that the body contains a recursive call, for example as
Possible solutions may include reworking of algorithms (for example to be tree-based rather than list-based) or using appropriate granularity forms. We hope that future implementations will allow thread ``re-deployment'' in order to mitigate this problem. This problem applies to plet, pargs, pand, por, and spec-mv-let.
We can use a let binding to compute the recursive calls of
(defun fib (x) (declare (xargs :guard (natp x))) (cond ((or (zp x) (<= x 0)) 0) ((= x 1) 1) (t (let ((a (fib (- x 1))) (b (fib (- x 2)))) (+ a b)))))
By using the parallelism primitive plet, we can introduce parallelism much as we did using pargs, with an optional granularity form, as follows.
(defun pfib (x) (declare (xargs :guard (natp x))) (cond ((or (zp x) (<= x 0)) 0) ((= x 1) 1) (t (plet (declare (granularity (> x 27))) ((a (pfib (- x 1))) (b (pfib (- x 2)))) (+ a b)))))
Notice that this time, we were able to use
Introducing Pand (By Way of a Tree Validation Example)
Consider ``genetic trees'' that contains leaves of DNA elements, in the
sense that each tip is one of the symbols
(defun valid-tip (tip) (declare (xargs :guard t)) (or (eq tip 'A) (eq tip 'G) (eq tip 'C) (eq tip 'T)))
Now we define a function that traverses the tree, checking that every tip is valid.
(defun valid-tree-serial (tree) (declare (xargs :guard t)) (if (atom tree) (valid-tip tree) (and (valid-tree-serial (car tree)) (valid-tree-serial (cdr tree)))))
We also define a parallel version.
(defun valid-tree-parallel (tree) (declare (xargs :guard t)) (if (atom tree) (valid-tip tree) (pand (valid-tree-parallel (car tree)) (valid-tree-parallel (cdr tree)))))
Before we can time the functions, we need to create test trees. We have found that test trees need to be approximately 1 gigabyte before we clearly see speedup, and we make them asymmetric to demonstrate the ability of our implementation to adapt to asymmetric data. We can create the trees with the execution of the following forms.
(defun make-valid-binary-tree (x) (declare (xargs :mode :program)) (if (< x 0) (cons (cons 'C 'G) (cons 'A 'T)) (cons (make-valid-binary-tree (- x 2)) ; 2 to make asymmetric (make-valid-binary-tree (- x 1))))) (defconst *valid-tree* (make-valid-binary-tree 30)) ; takes awhile (defconst *invalid-tree* (cons *valid-tree* nil)) ; nil is invalid tip
We can time our functions with the forms:
(time$ (valid-tree-serial *valid-tree*)) (time$ (valid-tree-parallel *valid-tree*))
Unfortunately, the serial version runs faster than the parallelism version; however, there is more to this story.
Demonstrating Early Termination with an Invalid Tree
Now observe this magic:
(time$ (valid-tree-serial *invalid-tree*)) (time$ (valid-tree-parallel *invalid-tree*))
The serial version took awhile, while the parallel version finished almost
immediately. The test for validation was split into testing the car
and the cdr of the
Granularity with Pand
We can also define a parallel version with a granularity form:
(defun valid-tree-parallel (tree depth) (declare (xargs :guard (natp depth))) (if (atom tree) (valid-tip tree) (pand (declare (granularity (< depth 5))) (valid-tree-parallel (car tree) (1+ depth)) (valid-tree-parallel (cdr tree) (1+ depth)))))
We can test this form by executing our previous forms. You will probably
find some speedup on a machine with several cores available, but more speedup
can likely be obtained with an expensive test on the leaves in place of
(time$ (valid-tree-serial *valid-tree*)) (time$ (valid-tree-parallel *valid-tree* 0))
Por can be used in the same way as pand, but with early
termination occurring when an argument evaluates to a non-