Dynamic Memory Allocator for a GPU

Project contacts

Hochan Lee (hochan@utexas.edu), Roshan Dathathri (roshan@cs.utexas.edu)

Overview

GPUs have become a popular platform for improving the performance of many applications. However, they have limited memory on the device. To address this, modern GPU architectures introduce Unified Memory (UM) [1] [2] [3]. UM automatically chooses and allocates either CPU or GPU memory, but can be accessed by both the CPU and the GPU. Therefore, using UM, GPUs have access to a much larger memory pool.

Dynamic data structures like hash tables are expressed quite easily in languages like C++ for host CPUs because CPUs support efficient dynamic memory allocation. There are also libraries like Galois that provide concurrent and scalable memory allocators. These works use a memory pool per thread and manage memory by always providing (padded) allocations in a fixed chunk size (for example, powers of 2) [4]. On the other hand, there are no efficient dynamic memory allocators for a GPU. In this project, you will build a dynamic memory allocator for a GPU using UM. This includes surveying the literature for existing solutions of memory managements for CPU or GPU [5][6]. The goal should be to implement a dynamic data structure like a dynamic hash table using the allocator. There is some recent work [7] that implements a dynamic hash table but it is limited and does not support UM. Your implementation should be more general and more efficient than that solution.

Hardware
Project deliverables and deadlines
  1. (Nov 6) A clear description of your planned project and brief understanding of the exsiting memory allocator.

  2. (Nov 13) A surveying and understanding of GPU memory technologies including performance aspects

  3. (Dec 6) An implementation of the GPU memory allocator

  4. (Dec 6) A project report, written like an ACM conference paper, that summarizes your work.

Papers

[1] Unified Memory Programming (link)

[2] Everything you need to know about unified memory (link)

[3] Memory Management on Modern GPU Architectures (link)

[4] Galois manual (link)

[5] Performance Evaluation of Data Migration Methods Between the Host and the Device in CUDA-Based Programming (link)

[6] Overlapping Host-to-Device Copy and Computation using Hidden Unified Memory (link)

[7] A dynamic hash table for the GPU, IPDPS 2018 (link)