Parallel & Distributed Pattern Mining on Large Graphs

 
Project contacts: Xuhao Chen (cxh@utexas.edu)
Project description: Writing efficient parallel & distributed programs to perform graph mining[3,4] is challenging. Galois[1] is a parallel & distributed system which simplifies graph analytics programming. However, Galois currently lacks support for graph mining applications as there are intermediate data structures, other than the graph itself, that are accessed and updated concurrently during execution. The goal of this project is to take a graph mining problem (k-clique[5], k-motif[6] etc.), and then design and implement efficient parallel & distributed algorithm tailored for a specific platform. Communication-computation overlapping and load balancing will be exploited. An extra task is to develop a simple interface (extending Gluon[2] API) to provide transparent distribution[3] to the high-level programmer.

Project deliverables and deadlines:

Papers:
[1] The tao of parallelism in algorithms, PLDI '11,
http://iss.ices.utexas.edu/Publications/Papers/pingali11.pdf

[2] Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics, PLDI’18, https://dl.acm.org/citation.cfm?id=3192404

[3] Arabesque: a system for distributed graph mining, SOSP’15,
https://dl.acm.org/citation.cfm?id=2815410

[4] DistDC: High Performance Distributed Triangle Counting, Graph Challenge 2019,
https://www.cs.utexas.edu/~roshan/DistTC.pdf

[5] Listing k-cliques in Sparse Real-World Graphs, WWW’18,
https://dl.acm.org/citation.cfm?id=3186125

[6] Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks, CIKM’16,
https://dl.acm.org/citation.cfm?id=2983832