Computer Architecture Seminar Abstracts

Fall 2008

David Brooks

Computer Design in the Nanometer Scale Era: Challenges and Solutions


Technology scaling has enabled tremendous growth in the computing industry over the past few decades. However, recent trends in power dissipation, reliability, thermal constraints, and device variability threaten to limit the continued benefits of device scaling, curtail performance improvements, and cause increased leakage power in future technology generations. The temporal and spatial scales of these effects motivate holistic solutions that span the circuit, architecture, and software layers. In this talk, I will describe several ongoing projects that seek to address technology scaling issues. These projects include efforts in the areas of a) power and performance modeling and design space optimization for future chip-multiprocessor systems, b) variability- tolerant design of memory hierarchies, and c) accelerator-based architectures for power/performance efficiency. The talk will also discuss our chip prototyping efforts that support this work.


David Brooks joined Harvard University in September of 2002 and is currently an Associate Professor of Computer Science. Dr. Brooks received his B.S. (1997) degree from the University of Southern California and his M.A. (1999) and Ph.D (2001) degrees from Princeton University, all in Electrical Engineering. Prior to joining Harvard University, Dr. Brooks was a Research Staff Member at the IBM T.J. Watson Research Center. Dr. Brooks received an IBM Faculty Partnership Award in 2004, an NSF CAREER award in 2005, and a DARPA Young Faculty Award in 2007. His research interests include architecture and software approaches to address power, reliability, and variability issues for embedded and high-performance computer systems.

Sanjay Patel
University of Illinois at Urbana-Champaign

Rigel: 1000-core Computing for the Masses


Consumer-side computing is again exerting its forces on the computing industry. Video gaming, diverse media, and high-definition video content have created a deep consumer demand for higher performance, and have brought supercomputing to the masses. Game consoles and graphics chips today have performance levels that have broken the TFLOPs barrier, with chip architectures that boldly embrace parallelism with 100s of cores. High-performance, parallel computing has entered our living room.

This democratization of high-performance computing is creating a transformational environment for many applications domain, which are seeing new capabilities enabled by these increases in performance. Rigel is a project that we are embarking on at Illinois where we are developing a 1000+ core architecture capable of scaling beyond 10 TFLOPs. We are using the Rigel chip as a development catalyst and for parallel programming frameworks for the masses and for next- generation visual computing applications. Rigel espouses a massive MIMD, scalable incoherent shared memory model, with a simple on-chip interconnect In this talki, I will provide an overview of the project, technical rationale for the architecture, and some details on the low-level parallel programming model.


Sanjay J. Patel is an Associate Professor of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign.

His research interests include high-performance and parallel chip architectures, parallel programming models, and chip implementation.

From 2005-2008, he was the CTO and Chief Architect of AGEIA Technologies, a fabless semiconductor company that developed chips for accelerating physical simulation for video games, before it was acquired by Nvidia Corporation. He also has industry experience architecting, designing, and validating chips at Digital Equipment Corporation, Intel, and HAL Computer Systems.

Patel earned his Bachelor (1990), Master of Science (1992) and Ph.D. (1999) in Computer Science and Engineering from the University of Michigan, Ann Arbor.

Gabriel Loh
Georgia Institute of Technology

3D Integration for High-Performance Processor Microarchitectures


Three-dimensional integration is a new fabrication technology that allows stacking multiple layers of silicon into a single, tightly integrated system. The advantages include greater device density, drastic reduction in wiring due to the flexibility of 3D placement and routing, and the potential for integration of heterogeneous technologies. In this talk, I will present some of the 3D microarchitecture work happening at Georgia Tech, which includes several applications of 3D at different levels of granularity: at the circuit level, functional unit block level, and at the system level.


Gabriel H. Loh received the B.E. degree in electrical engineering from Cooper Union, New York, NY, in 1998, and the M.S. and Ph.D. degrees in computer science from Yale University, New Haven, CT, in 1999 and 2002, respectively. From 2003 to 2004, he was a Senior Researcher with the Microarchitecture Research Laboratory at Intel Corporation in Austin, TX. He is currently an Assistant Professor in the School of Computer Science at the Georgia Institute of Technology. His research interests include computer architecture, processor microarchitecture, simulation, circuit design, three-dimensional integration technology, and ice hockey. He is a recipient of the NSF faculty early career development (CAREER) award.

Kunle Olukotun
Stanford University

Towards Pervasive Parallelism: Parallel Applications without Parallel Programming


Now that we are firmly entrenched in the multicore architecture era, to take full advantage of these architectures, many application developers will have to develop parallel applications. However, today, parallel programming is so difficult that it is only practiced by a few elite programmers. Thus, a key research question is what set of hardware and software technologies will make parallel computation accessible to average programmers. In this talk, I will outline the approach we are taking to answer this question in the Stanford Pervasive Parallelism Laboratory.


Kunle Olukotun is a Professor of Electrical Engineering and Computer Science at Stanford University where he has been on the faculty since 1992. Olukotun has been a researcher in and proponent of chip multiprocessor technology since the mid 1990's. Olukotun is well known for leading the Stanford Hydra research project which developed one of the first chip multiprocessors with support for thread-level speculation (TLS). Olukotun founded Afara Websystems to develop high-throughput, low power server systems with chip multiprocessor technology. Afara was acquired by Sun Microsystems; the Afara microprocessor technology, called Niagara, is the basis of systems that have become one of Sun's fastest ramping products ever. Olukotun is actively involved in research in computer architecture, parallel programming environments, and scalable parallel systems. Olukotun currently directs the Pervasive Parallelism Lab (PPL), which seeks to proliferate the use of parallelism in all application areas.

Olukotun is an ACM Fellow and IEEE Fellow. He has authored many papers on CMP design and parallel software and recently completed a book on CMP architecture. Olukotun received his Ph.D. in Computer Engineering from The University of Michigan.

Beeman Strong

Looking Inside Intel: the Core (Nehalem) Microarchitecture


Intel's next-generation microarchitecture (Nehalem) represents the next step in processor energy efficiency, performance, and dynamic scalability and was designed from the ground up to take advantage of hafnium-based Intel® 45nm high-k metal gate silicon technology.

What you will learn from this session:

*Details behind key microarchitecture features including:

    - Enhancements to the out-of-order execution engine

    - Enhancements to the platform bandwidth

    - Enhancements to the cache subsystem

    - Extensions to the instruction set with SSE4.2

*Description of the power management innovations on the next generation Intel® microarchitecture (Nehalem) family of processors, including:

    - Impact on typical and idle power consumption

    - Implications to processor performance


Beeman Strong received his BS from UT in 1996. He then spent 9+ years in architecture validation at Intel, starting with the P4 (Willamette). He then moved to the architecture team during the Nehalem project in 2005, focusing i on branch prediction, livelock breakers, and virtualization. He is now the branch predictor owner of Intel's next core microarchitecture.

Kei Hiraki
The University of Tokyo

GRAPE-DR Project: a combination of peta-scale computing and high-speed networking


The University of Tokyo and the National observatory of Japan have been jointly developing a GRAPE-DR system, which realize a combination of Peta-Scale computing and very high-speed data-sharing system for scientific computing. In this talk, we describe the outline of GRAPE-DR project, architecture of the GRAPE-DR processor, and the methods used in Data-Reservoir system which is used to share data among distant research institutes.

Main objectives of GRAPE-DR system are (1) realization of very cost-effective and power-efficient computation, (2) construction of a practical peta-scale computing system for computation-intensive scientific applications. GRAPE-DR adopts different approach, SIMD architecture without interconnects between processing elements(PEs). Figure 1 shows block diagram of GRAPE-DR processor chip. All the data transfer to and from PEs are achieved by broadcasting memory and reduction network with arithmetic units. This architecture is effective to reduce the amount of hardware. As shown in Table 1, the size of the die is much smaller than other chips for HPC systems, such as nVIDIA 8800 or CELL.

GRAPE-DR chip is carefully designed to compute several important applications including n-body problem for galaxy generation, molecular dynamics, quantum molecular simulation (e.g. FMO), dense linear equations (e.g. Linpack), and simulation in bio-informatics.


Kei Hiraki is a Professor in the Department of Computer Science, Graduate School of Information and Technology at the University of Tokyo. He received a BA, MS, and Ph.D. in physics from the University of Tokyo. He then worked in the Electrotechnical Laboratory at MITI in Japan from 1982 until 1988. At this time he came to the USA to work at IBM T.J. Watson Research Center. In 1991, he returned to Japan to be a professor at the University of Tokyo.

Prof. Hiraki performed wide range of research Topics including Dataflow architecture, Distributed Shared Memory, Highly-parallel architecture, and very high-speed internet communication. He currently holds all the classes of Internet2 Land Speed Records for high-speed, long-distance TCP communications.

Rajit Manohar
Cornell University

Ultra Low Power Processor for Sensor Networks


We present the design of SNAP, an ultra low power processor for sensor network applications. SNAP's architecture has been designed to be efficient at executing common protocol operations in sensor network applications. The circuit style used by SNAP has been optimized for both area and energy to enable the development of a small, long lifetime sensor node. The asynchronous nature of the processor enables efficient transitions from idle to active back to idle state. We present measured performance and energy results for our design. This is the first processor developed specifically for sensor network applications.


Rajit Manohar is an Associate Professor of Electrical and Computer Engineering at Cornell, where his group conducts research on asynchronous design. He received his B.S. (1994), M.S. (1995), and Ph.D. (1998) from Caltech, and has been on the Cornell faculty since 1998. He is the recipient of an NSF CAREER award, three best paper awards, five teaching awards, and was named to MIT technology review's top 35 young innovators under 35. He is a co-founder of Achronix Semiconductor, a fabless semiconductor company developing high- performance FPGAs.