CS 394P: Data Flow Analysis

Due: February 21, 2005.

This assignment is to write some basic programs for data flow analysis.

Examine the code in the file dataflow.lsp . This file contains some helper functions and some test data.

We will represent a basic block as a node containing code as a progn of statements. Each node also has a node number, the numbers of its successor nodes, and places to hold the data flow information. Data flow information will be represented as sets of symbols, where each symbol is either a variable or a compiler variable that represents a subexpression. You can use the Lisp functions union, intersection, and set-difference to manipulate these sets.

Call the function (initexprs) once to initialize the expression table, *exprs* . (initexprs) will associate each subexpression with a compiler variable, which will be a generated symbol. After calling (initexprs), you can look at *exprs* to see what it produced.
Write a function (comp-kill node) to create the computed and killed sets due to the code in a node, and save the results in the node data structure. You can use (setf (computed node) ...) and (setf (killed node) ...) to store the results. Each of these sets will be represented as a list of symbols.
Write a loop to call comp-kill on all nodes.
Write a function (avail-entry node) to compute what is available on entry to a node from what is available on exit from its predecessors. You can assume that (avail-exit node) has been stored; initially, of course, it will be empty.
Write a function to iteratively compute (avail-exit node) based on (avail-entry node) and what is computed and killed within the node. Continue updating these values until there is no further change.
Write a function to iterate through all nodes and identify redundant parts of the code. For each node, print out the node number and iterate through each statement of the code. Keep an available set that is available-on-entry initially, and update it for each statement. If a statement involves some redundant code, print out the statement and the redundant expression.

You will probably find that not all of the subexpressions that are actually redundant are identified by your algorithm. The remaining ones can be found using more sophisticated methods.

The following functions can be used in writing your code:

(predecessors node) returns a list of node numbers of the predecessors of node.
(expcomputed exp) returns the set of subexpressions that are computed in the process of computing exp.
(killedby var) returns the set of subexpressions that are killed by redefinition of var.