TeraFlops for Masses: Killer Apps of Tomorrow
Many think that the general-purpose processors we use today have enough power to run most applications. This is what we call 'good enough computing.' And many believe that the design complexity and power limitations of modern-day processors don't allow them to scale well to next levels of performance. If all this is true, perhaps we have reached the supply-demand equilibrium, which says, "We cannot offer significant speedup of your apps, and why do you need it anyway?" This talk is about why this is a myth. Next generation general-purpose mass applications require processing power of TeraFlops and beyond, and resulting workloads are likely to have profound implications for processor platform designs of tomorrow.
Dr. Pradeep K. Dubey is a senior principal engineer and manager of innovative platform architecture in the Corporate Technology Group at Intel.
His research interests include computer architectures for new application paradigms in future computing environments. Dubey previously worked at the IBM T.J. Watson Research Center and at Broadcom. He was one of the principal architects for the AltiVec multimedia extension to the PowerPC architecture.
He also worked on the design, architecture, and performance issues of various microprocessors, including Intel's 80386, 80486, and Pentium processors. He holds 24 patents and is an IEEE Fellow.
University of Colorado
Adaptive Resource Management in Multithreaded Architectures
In order to leverage increasingly many transistors, designers are moving toward including multiple multi-threaded processor cores on a single chip die. As these systems increase in size, designers will face new challenges in simultaneously managing current swings (di/dt), strictly limiting power consumption, and efficiently managing on-chip cache memory resources to meet system performance demands and real-time deadlines. Unfortunately, existing general-purpose operating systems and run-time systems are not adequately designed to support these architectures. Nevertheless, by integrating run-time monitoring and management techniques to dynamically adjust system resource allocation to applications characteristics, unparalleled advances in computing systems will be enabled.
We propose a full system approach for resource management of multithreaded multi-core systems. In such systems, opportunities to improve system behavior via adaptation occurs at three time scales - the microarchitecture controls activities that occur in the 10s-100s of cycles, the runtime system controls activities on the scale of 1000s-1000000s of cycles, and the operating system controls resources and activities at larger time scales. We present our initial investigation into novel techniques for each of these components that operate synergistically to enable the full potential of future systems.
Dan Connors is an Assistant Professor at the University of Colorado at Boulder. He received his Ph.D. in Computer Engineering from the University of Illinois at Urbana-Champaign in 2000. His research explores the interaction of compilers, run-time optimizers, and operating systems in modern architectures. He directs the DRACO research group, which investigates run-time optimization and compiler technologies that enable optimization, power efficiency, prefetching, and thread scheduling in future multi-threaded multi-core systems.
University of Michigan
Customizing the Computation Capabilities of Microprocessors
Application-specific extensions to the computational capabilities of a processor provide an efficient means to meet the growing performance of future applications without compromising power and cost considerations. Critical portions of application dataflow graphs are collapsed for accelerated execution on specialized hardware. Collapsing dataflow subgraphs compresses the latency along critical paths and reduces the number of intermediate results stored in the register file. Programmability is ensured by maintaining instruction-driven control of the specialized hardware. In this talk, I will examine two alternative strategies for customization: visible where the processor instruction set is extended with new instructions that exploit specialized function units; and, transparent where computation subgraphs are automatically identified at run-time and mapped onto a configurable compute accelerator. Visible customization provides the largest gains and is effective for application-specific instruction processors where a new processor is created for a particular application domain. Transparent customization enables application-specific processing on a general-purpose processor where the binary instruction format is fixed.
Scott Mahlke is the Morris Wellman Assistant Professor in the Electrical Engineering and Computer Science Department at the University of Michigan. From 1995-2001, he was a member of Compiler and Architecture Research Group at Hewlett-Packard Laboratories. He received his Ph.D. in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign. His research interests include compilers, high-level synthesis, and computer architecture. He directs the Compilers Creating Custom Processors group at Michigan, which focuses on automatic design of application-specific processors and hardware accelerators.
Georgia Institute of Technology
Energy Aware Algorithm Design via Probabilistic Computing: From Algorithms and Models to Moore's Law and Novel (Semiconductor) Devices
The energy consumed by computations is a significant concern, especially within the context of embedded systems, on par with the past focus on raw speed or its derivative, performance in the high-performance computing domain. In this talk, we will outline an entirely new approach to energy-aware computing: trading the probability of the BIT being correct for savings in the energy consumed, yielding a probabilistic bit or PBIT (instead of a conventional BIT which is guaranteed to be correct). At its heart, the approach taken here is built on the fundamental and novel thesis that the energy consumed by a computation is proportional to the associated accuracy, characterized as the probability of being correct, with which each ``bit'' is computed. With this as background, probabilistic hardware devices---these can be viewed as the ``hardware'' counterparts of the well-known probabilistic algorithms---and gates realized from conventional CMOS technology for computing PBITs will be described. Our probabilistic devices are constructed through the counterintuitive approach of using noise, which is increasingly being viewed as a hurdle to sustaining Moore's law, as a resource rather than as an impediment. Specifically, we have demonstrated that coupling thermally induced sources of noise, as well as the prevalent power-supply noise with a conventional CMOS device yields a probabilistic switch, which can in turn be a basis for realizing probabilistic applications in silicon. These probabilistic (hardware) switches compute with a definite probability of error, and have been demonstrated to serve as natural building-blocks in architectures for supporting probabilistic algorithms, yielding significant savings to the (energy x performance) metric in a variety of embedded computing applications ranging over speech and pattern recognition, robotics and others---improvements of over a factor of 100 within the context of an AMI 0.5Ám, a TSMC 0.25Ám and proprietary deep submicron processes, when compared to executing the same applications on a low energy embedded processor, the StrongARM SA-1100. At a deeper level, all of this work rests on the twin foundations of classical thermodynamics (of Maxwell, Boltzmann and Gibbs), and the relatively modern computational complexity theory. Time permitting, these foundations will be surveyed.
Krishna V. Palem holds Professorships with tenure in Electrical and Computer Engineering and in Computer Science in the College of Computing, a senior research leadership in the College of Engineering, and has been the founding director of the Center for Research in Embedded Systems and Technology (CREST) (www.crest.gatech.edu) at the Georgia Institute of Technology, since 1999. Previously, he held positions at the IBM T. J. Watson Research Center and NYU's Courant Institute of Mathematical Sciences (Computer Science). His work is recognized internationally in academia and in industry ranging from contributions to algorithms, compiler optimizations, reconfigurable computing systems, as well as in power-aware computing and most recently microelectronics. Over the past decade, he has focused on applying innovations from these disciplines to the increasingly significant domain of embedded computing systems. He has awards for excellence from Hewlett Packard, IBM and Panasonic and among others serves on the editorial boards of the recently formed ACM Transactions on Embedded Computing Systems. Palem laid the foundations of architecture assembly which is at the heart of the product offerings of Proceler Inc.---an Atlanta based venture. The prestigious Analysts' Choice Awards recognized Proceler's technology, by nominating it as one of the outstanding technologies of 2002, and his Ph.D advise Suren Talla was recognized with a dissertation award from NYU for aspects of this work. He has chaired (and co-chaired) meetings whose advise has led to funding initiatives in Embedded and Hybrid Systems in the US, as well as by the leading research funding agency A*Star of Singapore. He was a Schonbrunn visiting professor at the Hebrew University of Jerusalem, Israel, where he was recognized for excellence in teaching. He is a Fellow of the IEEE.
The Future Evolution of High-Performance Microprocessors
The evolution of high-performance microprocessors is fast approaching several significant inflection points.
First, the marginal utility of additional single-core complexity is now rapidly diminishing due to a number of factors. The increase in instructions per cycle from increases in sizes and numbers of functional units has plateaued. Meanwhile the increasing sizes of functional units and cores are beginning to have significant negative impacts on pipeline depths and the scalability of processor clock cycle times.
Second, the power of high performance microprocessors has increased rapidly over the last two decades, even as device switching energies have been significantly reduced by supply voltage scaling. However future voltage scaling will be limited by minimum practical threshold voltages. Current high-performance microprocessors are already near market limits of acceptable power dissipation. Thus scaling microprocessor performance while maintaining or even reducing overall power dissipation without the benefit of appreciable further voltage scaling will prove especially challenging.
In this keynote talk, we will discuss these issues and propose likely scenarios for the future evolution of high-performance microprocessors.
Norman P. Jouppi is a Fellow at HP Labs in Palo Alto, California. From 1984 through 1996 he was also a consulting assistant/associate professor in the department of Electrical Engineering at Stanford University. He received his PhD in Electrical Engineering from Stanford University in 1984.
He started his contributions to high-performance microprocessors as one of the principal architects and the lead designer of the Stanford MIPS microprocessor. While at Digital Equipment Corporation's Western Research Lab he was the principal architect and lead designer of the MultiTitan and BIPS microprocessors. He has also contributed to the architecture and implementation of graphics accelerators, and has conducted extensive research in telepresence. He holds more than 25 U.S. patents and has published over 100 technical papers. He currently serves as ACM SIGARCH Chair and is a Fellow of the IEEE.