TRIPS Tutorial at ISCA 2005:
Design and Implementation of the TRIPS EDGE Architecture

Saturday, June 4, 9:00 am - 5:30 pm


Presenters


Abstract

The computer architecture community is currently in the midst of a rare transition. The effort of moving to wider-issue RISC and CISC processors has been all but abandoned due to power and complexity limitations. Power limitations coupled with diminishing returns have also greatly slowed improvements in clock rate, which will now come solely from faster devices (15% per year) instead of faster devices coupled with deeper pipelining (40% per year from 1990 to 2004). The slowing performance gains from traditional processing cores has produced an industrial shift of focus to chip multiprocessors. While CMPs will doubtless improve the performance of many workloads, this trend puts the burden of improved performance squarely on programmers. Furthermore, applications that are too difficult to parallelize will not benefit from the CMP trend.

An alternative is Explicit Data Graph Execution (EDGE) architectures, which, unlike RISC and CISC instruction sets, explicitly encode dependences into individual instructions. This encoding permits dataflow-like execution without the hardware overheads of conventional out-of-order processors, in which the hardware must reconstruct dependences on the fly. While CMPs of EDGE processors are certainly possible, they also provide the option of scaling to wider-issue cores, improving single-thread performance further with no programmer intervention needed.

The TRIPS architecture is an example of an EDGE architecture that supports a static placement, dynamic issue (SPDI) execution model. TRIPS programs are compiled into graphs of predicated hyperblocks, each of which is represented internally as a dataflow graph, with instructions communicating directly though instruction-encoded dependences. Each hyperblock has a set of input and output registers, which is how communication occurs between them. The TRIPS architecture supports up to a maximum of 8 128-instruction hyperblocks to be executing on a processor core simultaneously, thus enabling a 1,024 instruction window. We have designed a full proof-of-concept implementation of the TRIPS architecture, each chip of which contains 2 16-wide out-of-order issue cores, and a 1 MB static NUCA cache.

This tutorial will cover the architecture and the microarchitecture of the TRIPS prototype in detail. We will briefly show how EDGE architectures have the capability of addressing power, wire delay, and complexity issues. We will cover the salient features of the instruction set in details, emphasizing the trade-offs that we found during its definition. We will then explore the microarchitecture in detail, focusing on the components that differ from conventional architectures. The emphasis will be on both high-level issues and implementation challenges. Finally, we will present the new compiler algorithms and implementation necessary to compile high-quality TRIPS code, as well as the results of a prototype performance analysis.

Our goal for the tutorial participants is to provide a detailed understanding of EDGE architectures, implementation challenges, performance trade-offs, and unanswered research questions. We have reserved time for informal discussions and questions.

For more information on the project, please visit the TRIPS project website.


Intended audience

Any industrial practitioners and academic or industrial researchers interested in learning about EDGE ISAs and the TRIPS implementation in depth, as well as how they provide potential power, scalability, and performance advantages for future CMOS technologies. The presentations will assume that participants have a solid understanding of traditional architecture and microarchitecture techniques, such as branch prediction and out-of-order execution, as well as datapath design and pipelining.


Agenda


Speaker Biographies

Doug Burger is an associate professor in the Department of Computer Sciences at the University of Texas at Austin, and is co-leader of the TRIPS project.

Stephen W. Keckler is an associate professor in the Department of Computer Sciences at the University of Texas at Austin, and is co-leader of the TRIPS project.

Kathryn McKinley is a professor in the Department of Computer Sciences at the University of Texas at Austin. She leads the TRIPS compiler effort.

Robert McDonald is the chief engineer for the TRIPS prototype chip at the University of Texas at Austin. Prior to that position, he worked at Chicory, Inc., and before that was a logic designer on the IBM Power4 processor.

Members of the TRIPS development team will also be presenting at the workshop:

Ramdas Nagarajan is a fifth-year Ph.D. student at the University of Texas at Austin. He received the UT-Austin Cooperative Society Award for best research paper of 2002. His dissertation topic covers high-ILP execution for single-threaded codes on EDGE architectures.

Nitya Ranganathan is a fourth-year Ph.D. student at the University of Texas at Austin. Her dissertation topic covers control flow prediction and speculation in EDGE architectures.

Haiming Liu is a fourth-year Ph.D. student at the University of Texas at Austin. His dissertation topic covers instruction fetch strategies and multithreading issues for EDGE architectuers.

Karu Sankaralingam is a fifth-year Ph.D. student at the University of Texas at Austin. He was first author on a paper selected for IEEE Micro's 2003 "Top Picks in Computer Architecture" issue, and recently received the 2004 James C. Browne Fellowship from the Dept. of Computer Sciences at UT-Austin. His dissertation topic covers streaming, vector, and fine-grained parallel execution on EDGE architectures.

Simha Sethumadhavan is a fourth-year Ph.D. student at the University of Texas at Austin. He was first author on a paper selected for IEEE Micro's 2004 "Top Picks in Computer Architecture" issue. His dissertation topic covers efficient memory ordering and disambiguation for both conventional and EDGE architectures.

Changkyu Kim is a fourth-year Ph.D. student at the University of Texas at Austin. He was first author on a paper selected for IEEE Micro's 2003 "Top Picks in Computer Architecture" issue. His dissertation topic covers non-uniform cache access (NUCA) architectures.

Paul Gratz is a third-year Ph.D. student at the University of Texas at Austin. His dissertation topic covers on-chip communication networks.

PK Shivakumar is a fourth-year Ph.D. student at the University of Texas at Austin. His dissertation topic covers reliability issues for high-performance microprocessors implemented in future CMOS technologies.

Aaron Smith is a first-year Ph.D. student at the University of Texas at Austin, before which he held positions in software development at Metrowerks and Dell, Inc. He has extensive industrial experience in compiler design and implementation. His dissertation topic covers compilation strategies for EDGE architectures.


Page last modified by dburger@cs.utexas.edu) on February 6, 2005