limit the amount of parallelism
Major Section:  PARALLELISM

Some function calls are on arguments whose evaluation time is long enough to warrant parallel evaluation, while others are not. A granularity form can be used to make appropriate restrictions on the use of parallelism.

For example, consider the Fibonacci function. Experiments have suggested that execution time can be reduced if whenever the argument is less than 27, a serial version of the Fibonacci function is called. One way to utilize this information is to write two definitions of the Fibonacci function, one serial and one parallel.

(defun fib (x)
  (declare (xargs :guard (natp x)))
  (cond ((or (zp x) (<= x 0)) 0)
        ((= x 1) 1)
        (t (binary-+ (fib (- x 1))
                     (fib (- x 2))))))

(defun pfib (x) (declare (xargs :guard (natp x))) (cond ((or (zp x) (<= x 0)) 0) ((= x 1) 1) ((< x 27) (binary-+ (fib (- x 1)) (fib (- x 2)))) (t (pargs (binary-+ (pfib (- x 1)) (pfib (- x 2)))))))

We realize quickly that writing both of these function definitions is cumbersome and redundant. This problem can be avoided by using a granularity declaration with a parallelism primitive. This form ensures that a call is parallelized only if resources are available and the granularity form evaluates to a non-nil value at the time of the call. Below is a definition of the Fibonacci function using a granularity form.
(defun pfib (x)
  (declare (xargs :guard (natp x)))
  (cond ((or (zp x) (<= x 0)) 0)
        ((= x 1) 1)
        (t (pargs (declare (granularity (>= x 27)))
                  (binary-+ (pfib (- x 1))
                            (pfib (- x 2)))))))

A granularity form can reference an extra formal parameter that describes the call depth of the function the user is parallelizing. Consider for example the following parallel mergesort function, based on Davis's Ordered Sets library. It splits the data into symmetric chunks for computation, so we increment the depth argument during the recursive call on both the car and cdr.

(include-book "finite-set-theory/osets/sets" :dir :system)
(defun sets::pmergesort-exec (x depth)
  (declare (xargs :mode :program))
  (cond ((endp x) nil)
        ((endp (cdr x)) (sets::insert (car x) nil))
        (t (mv-let (part1 part2)
                   (sets::split-list x nil nil)
                    (declare (granularity (< depth 2)))
                    (sets::union (sets::pmergesort-exec part1
                                                        (1+ depth))
                                 (sets::pmergesort-exec part2
                                                        (1+ depth))))))))

A less intrusive method (i.e., not requiring an extra formal parameter like the depth argument just above), which however can be less efficient, involves analyzing the data itself for structural properties. For example:

(defun some-depth-exceeds (x n)
  (declare (xargs :guard (natp n)))
  (if (atom x)
    (if (zp n)
      (or (some-depth-exceeds (car x) (1- n))
          (some-depth-exceeds (cdr x) (1- n))))))

(defun valid-tip (x) (declare (xargs :guard t)) (or (eq x 'A) (eq x 'T) (eq x 'C) (eq x 'G)))

(defun pvalid-tree (x) (declare (xargs :guard t)) (if (atom x) (valid-tip x) (pand (declare (granularity (some-depth-exceeds x 3))) (pvalid-tree (car x)) (pvalid-tree (cdr x)))))

If you experiment with calls of pvalid-tree, you are likely to find that the ``speedup'' it provides over a corresponding serial version is, in fact, a slowdown! The problem is likely that some-depth-exceeds is an expensive function to run repeatedly. Instead of the approach above, it is often handy to add an extra formal parameter in order to allow for more efficient granularity forms, as we have done above in the definition of SETS::pmergesort-exec.