Gremlin Bytecode to IrGL

Table of Contents

1 Project Contacts

Sreepathi Pai and Keshav Pingali

2 Project Description

One of the most aspects of the current big data "revolution" has been an increased interest in non-relational database paradigms. Of these, databases that store graphs and support queries on them are increasingly important. Example graphs stored in these databases include social networks, road networks, protein interaction networks, chemical reaction networks, etc.

The Apache TinkerPop project has created "Gremlin", a graph traversal and query language. These queries compile down to a custom bytecode which is then interpreted by various backends such as Apache Spark, IBM Graph, Neo4j, etc.

This project involves compiling Gremlin queries to the IrGL programming language for GPUs. The IrGL programming language is a research language for generating high-performance graph applications. Starting from a high-level input, the IrGL compiler applies several optimizations to produce very fast CUDA code.

GPUs are very interesting for databases due to the large amounts of memory bandwidth available (10x that of the CPU) a gap that will only widen. This is currently an upcoming and exciting area of research due to numerous high-value applications.

One goal of this project is to demonstrate execution of the queries in the "Ruminations on SparkGraphComputers" benchmark on the Friendster graph. As a stretch goal, the implementors could also explore query optimization on graph programs.

This project expects familiarity with CUDA/GPU computing as pre-requisite.

3 Project Deliverables

  1. (Nov 1) Description of high-level transformations of Gremlin queries to IrGL programs.
  2. (Dec 6) A compiler that implements the Gremlin-to-IrGL transformations you described.
  3. (Dec 6) A project report in the ACM SIGPLAN style (max 10 pages).

4 References

5 Acknowledgments

Dylan Bethune-Waddell provided the focus on the Gremlin query language as well as the pointer to the "Ruminations on SparkGraphComputers" benchmark.