CS 377P: Programming for Performance
Assignment 3: Operator formulation
of algorithms
Due date: March 7th, 2017
Late submission policy: Submission can be at the most 2
days late. There will be a 10% penalty for each day after the due
date (cumulative).
Clarifications to the assignment are posted at the bottom of
the page.
Description
This assignment introduces you to the operator formulation of
algorithms. The motto introduced in class is Algorithm =
Operator + Schedule, and in this assignment, you will
implement sequential algorithms for the single-source shortest-path
(sssp) problem to understand this motto. Read the entire assignment
before starting your coding, and get started early: this assignment
requires more programming than previous assignments.
Key concepts
Recall that we classify algorithms into topology-driven and
data-driven algorithms.
Topology-driven algorithms make a number of sweeps over the graph.
At the start of the algorithm, node labels are initialized as needed
by the algorithm (for example, for sssp, the label of the source
node is initialized to zero and the labels of all other nodes are
initialized to
$\backslash infty$). In
each sweep, the operator is applied to all nodes. The algorithm
terminates when a sweep does not modify the label of any node. In
some problems, particularly those in which labels are floating point
numbers, we may never get to exact convergence so we terminate the
algorithm when node updates are below some threshold or when some
upper bound on the number of iterations is reached.
Data-driven algorithms maintain a work-list of active nodes. The
work-list can be considered to be an abstract data type (class) that
supports two methods: put and get. Active nodes are
added to the work-list by invoking the put method with the
set of active nodes. The work-list can be maintained either as a set
(so no duplicates are allowed) or as a multi-set (duplicates are
allowed). In this assignment, work-lists can be implemented as
multi-sets so you do not need to check for duplicates. The get
method returns an active node from the work-list if it is not empty,
and removes it from the work-set. If there are multiple active nodes
in the work-list, the schedule determines which one is returned.
Applying the operator to an active node may change the labels of
other nodes in the graph; if so, these nodes become active and are
added to the work-list. For problems in which labels are
floating-point numbers, we may choose not to activate a node if the
change to its label is below some threshold. Data-driven algorithms
terminate when the work-list is empty and all active nodes have been
processed.
Graph formats
Input graphs will be given to you in DIMACS format,
which is described at the end of this assignment. The output for
each algorithm should be produced as a text file containing one line
for each node, specifying the number of the node and the label of
that node.
- You can find all graphs for this assignment on Stampede here:
/work/01131/rashid/class-inputs .
- We have provided the following graphs for sssp: power-law
graphs rmat15, rmat20, rmat22, and rmat23,
and road networks road-FLA (Florida road network) and road-NY
(New York road network). Graphs like rmat22 and rmat23 are quite
big so do not do any runs with them until your code has been
debugged on some small graphs that you have constructed.
Coding
- I/O routines for graphs: These routines
will be important for debugging your programs so make sure they
are working before starting the rest of the assignment.
- Write a C++ routine that reads a graph in DIMACS format from
a file, and constructs a Compressed-Sparse-Row (CSR)
representation of that graph in memory. Node and edge labels
can be ints for the graphs we are dealing with.
- Write a C++ routine that takes a graph in CSR representation
in memory, and prints it out to a file in DIMACS format.
- Write a C++ routine that takes a graph in CSR representation
in memory, and prints node numbers and node labels, one per
line.
- Data-driven algorithms: Implement a routine that takes
a graph G and a work-list w of active nodes as
input, and performs a data-driven sssp computation on graph G.
By passing different work-lists to this routine as described
below, you can implement different data-driven algorithms for
sssp without changing the code in your routine. Instrument your
code to count the number of node and edge relaxations.
- Graph initialization: read in the graph from the
file, create the graph in CSR format in memory, and initialize
node labels so that the source node has label 0 and all other
nodes are initialized to a large positive number (you can use
INT_MAX).
- Chaotic relaxation sssp algorithm:
- Implement a work-list called bag for the
work-list. The get method for this work-list should
select a random active node from the nodes in the work-list.
- You can use the rand function in C++ to generate
random numbers; this webpage shows you how to generate
random numbers within a particular range http://www.cplusplus.com/reference/cstdlib/rand/
By using different seeds, you can generate different
sequences of random numbers.
- Chaotic relaxation can take a very long time even for
small graphs for some schedules of node relaxations. Your
code should terminate the computation if the number of
relaxations exceeds some bound that depends on the size of
the graph.
- Delta-stepping sssp algorithm:
- Implement a work-list implemented as a sequence of bags in
which the first bag contains nodes with labels in the
interval [0,Δ),$\mathrm{the\; second\; bag\; contains\; nodes\; with\; labels\; in\; the\; interval}$$$$\mathrm{[\Delta ,2}$$\mathrm{\Delta ),\; etc.}$The get
method should return a random node from the first non-empty
bag. The value of Δ should be a parameter to the constructor
for your work-list. For efficiency, your work-list can keep
track of the first non-empty bag instead of searching the
bags one at a time to find the first non-empty bag.
- Setting Δ to one in the delta-stepping algorithm gives you
Dijkstra's algorithm. You may get better performance by
using a heap to implement the work-list but you do not need
to implement this.
Experiments
Data-driven sssp algorithms
- graphs: rmat15, rmat20, rmat22, rmat23, road-NY,
road-FL.
- [updated] source node for sssp computation:
node 1 for all rmat graphs, node 140961 for road-NY, node
316607 for road-FL. These are the nodes with
the highest degree.
- Draw two small graphs with roughly 5 nodes and 20 edges, and
generate files for them in DIMACS format. You should use these
graphs to debug your code before using the bigger graphs we have
provided to you.
- Submit these two graphs with your report.
- Write a routine that traverses a graph in CSR format and
determines the number of the node with the largest out-degree.
This is an exercise to check that you understand the CSR format
and know how to use it for graph algorithms.
- Report this node number for each of the graphs given to
you (you should check that this is the same as the source
node for sssp described above).
- Chaotic relaxation:
- Experiment with three different seeds for the random number
generator.
- Report the running times, the number of node relaxations,
and the number of edge relaxations for rmat15. If your code
timed out, put some symbol like "*" in the table for that
experiment.
- Dijkstra's algorithm:
- Run Dijkstra's algorithm on rmat15 and road-NY.
- Report the number of node relaxations.
- Compute analytically what this number should be, and
compare it with the number from your experiment.
- Output the final node labels for both graphs in the format
specified in the Graph Formats section of this
assignment.
- Delta-stepping:
- Determine experimentally the optimal values of $\Delta $
for rmat15 and for road-NY, and report these in your
submission.
- Output the final node labels for both graphs.
- Use the $\Delta $
value you found for rmat15 to perform sssp for all the rmat
graphs. Plot a graph in which the x-axis is the number of
nodes in the rmat graph and the y-axis is the running time.
- Plot a similar graph for the number of node relaxations.
Submission
Submit (in canvas) your code and all the items listed in the
experiments above.
Grading
Code: 50 points
Experiments: 50 points
DIMACS format for graphs
One popular format for representing directed graphs as
text files is the DIMACS
format (undirected graphs are represented as a directed graph by
representing each undirected edge as two directed edges). Files
are assumed to be well-formed and internally consistent so it is
not necessary to do any error checking. A line in a file
must be one of the following.
Notes added after assignment was posted:
- (2/21, 2:13 PM): You may use
classes from the C++ STL and boost libraries if you
wish.
- (2/22, 5:36 PM): I changed the definition of edges in
the DIMACS format. Edges in the file start with "a" (for
arc).
- (2/25: 12:09PM): Because of the
generator used for rmat graphs, the files for some of
the graphs may have multiple edges between the same pair
of nodes. When building the CSR representation in
memory, keep only the edge with the largest weight. For
example, if you find edges (s d 1) and (s d 4) for
example, keep only the edge weight 4. In principle, you
can keep the smallest weight edge or follow some other
rule, but I want everyone to follow the same rule to
make grading easier. This has been discussed twice in
piazza as well but feel free to post there if this is
not clear.
- (3/3: 6:04PM): Source nodes for
SSSP computations have been updated above and in Piazza.
- (3/4: 2:00PM/8:10PM): Here is the solution to the rmat15
sssp problem.