PhD Proposal: Dong Li, ACES 5.336
PhD Proposal: Dong Li
Date: Friday, June 21
Time: 1:00 PM.
Place: ACES 5.336
Research Supervisor: Professor Donald Fussell, Professor Doug Burger
Title: Dynamic Resource Management of Memory Hierarchies in Throughput-Oriented Processors
Throughput-oriented processors such as many-core processors and massive-threading GPUs have been widely adapted for general purpose computation, but their throughput is hindered by the memory system in many cases. The goal of this proposal is to optimize the memory system of throughput-oriented processors via dynamically managing resources in the memory hierarchy including all levels of cache, data-transmission priority and DRAM bandwidth.
My research is motivated by following observations in the context of throughput-oriented processor: (1)a massive-threading application often generates many outstanding memory requests and often overfeeds the cache. In many cases, allocating cache dis-proportionally to threads and allowing a subset of threads to keep their data set in cache can lead to a cache miss reduction and an overall throughput improvement. (2)throughput-oriented processors often can not achieve good cache hit ratio and high memory bandwidth utilization simultaneously. (3)hardware hazards often limit the overall throughput.
The major contributions of this proposal are as follows: (1)we categorize hardware hazards and analyze delays of data-paths between streaming-multiprocessors (SM) and DRAM controllers. We identify that MSHR outage and L2 interface contention are memory system bottlenecks in many cases. (2)we propose an asymmetric cache partition (ACP) mechanism.ACP allows fine-grain allocation of cache capacity to threads with the goal increasing throughput. Low priority threads are partially throttled by the limited cache capacity they can access. High priority threads reserve a large portion of cache thus achieve a higher cache hit ratio, and lead to higher throughput. (3)we propose a thread-level cache bypassing (TCB) mechanism, which allows a subset of threads (cache-threads)to exclusively utilize cache and achieve good cache hit ratio. The other threads (bypass-threads) are forced to bypass cache, and fully utilize the DRAM bandwidth left by cache-threads. TCB enables processors to achieve a good cache hit ratio and a high bandwidth utilization simultaneously. At the same time, TCB reduces hardware hazards via bypassing bottlenecks of GPU memory system. Experimental results show that TCB has large potential to enable significant performance improvement for memory-bounded applications.
- Awards & Honors
- About Us
- Student Engagement and Support
- Masters Program
- Ph.D. Program
- Financial Information
- Prospective Students
- Incoming Students
- Current Students
- Portfolio Program in Robotics
- Curricular Practical Training
- Grad Student Talks
- UTCS Direct