Parallelization of Asynchronous Variational Integrators on Distributed Memory Systems

Contacts

M. Amber Hassaan

Abstract

In the operator formulation, ordered algorithms are algorithms in which there is an application-specific order in which activities must appear to have been executed. Time-dependent simulations like discrete-event simulation are the classical examples: in these applications, the simulation must respect the time-order of events even if events are processed in parallel.
Ordered algorithms are far more complex to parallelize effectively than unordered algorithms. One promising approach, explored in Amber Hassaan's dissertation, is to use the Kinetic Dependence Graph (KDG) [1,2]. The kinetic dependence graph is a generalization of standard dependence graphs for applications in which the execution of a task may create new tasks. We have used the KDG to parallelize complex simulation codes such as asynchronous variational integrators (AVIs), which are finite-element codes that are irregular in both space and time (no knowledge of finite-elements is needed for this project). 
At present, KDGs are implemented in the shared-memory version of Galois, and we have an implementation of AVI that performs quite well on large-scale shared-memory machines. The goal of this project is to implement a distributed-memory version of AVI using KDGs. The basic idea is to partition the mesh between the hosts of a distributed-memory computer (we will give you a mesh partitioner), and let each host perform simulations on its portion of the mesh (this can be done in parallel using shared-memory parallelism). Naturally, the KDG has to be implemented in a partitioned way as well. Synchronization between hosts will be needed to update the mesh and the KDG in a consistent way.

Project deliverables and deadlines

  1. (Nov 1st) Read papers, download, compile and run Galois (beta release) and AVI application
  2. (Nov 8th) A clear description in English of the overall parallelization strategy.
  3. (Nov 15th) An implementation of AVI in MPI, using shared-memory Galois within each host
  4. (Dec 6th) Implementation in distributed-memory version of Galois (Abelian)
  5. (Dec 6th) Project report describing AVI and  its distributed memory implementation. A detailed comparison of performance on a distributed machine (e.g. TACC) for large inputs.
  6. A project report, written like an ACM paper, describing what you did for your project.

Papers

  1. Brief Announcement: Parallelization of Asynchronous Variational Integrators for Shared Memory Architectures , Hassaan, Nguyen and Pingali, SPAA’2014
  2. The tao of parallelism in algorithms. Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 12-25, 2011.
  3. Kinetic Dependence Graphs, Hassaan, Nguyen and Pingali, ASPLOS’2015