LogTM: Log-based Transactional Memory
TRANSACTIONAL MEMORY (TM) simplifies parallel programming by guaranteeing that transactions appear to execute atomically and in isolation. Implementing these properties includes providing data version management for the simultaneous storage of both new (visible if the transaction commits) and old (retained if the transaction aborts) values. Most (hardware) TM systems leave old values "in place" (the target memory address) and buffer new values elsewhere until commit. This makes aborts fast, but penalizes (the much more frequent) commits.
In this work, we present an implementation of transactional memory, LOG-BASED TRANSACTIONAL MEMORY (LogTM), that makes commits fast by storing old values to a per-thread log in cacheable virtual memory and storing new values in place. LogTM makes two additional contributions. First, LogTM extends a MOESI directory protocol to enable both fast conflict detection on evicted blocks and fast commit (using lazy cleanup). Second, LogTM handles aborts in (library) software with little performance penalty. Evaluations running micro- and SPLASH-2 benchmarks on a 32-way multiprocessor support our decision to optimize for commit by showing that only 1-2% of transactions abort.
I will also touch upon recent results for supporting transactional nesting (closed and open nesting with partial aborts) and invoking non-transactional system code (escape actions).
C.f. http://www.cs.wisc.edu/multifacet/papers/hpca06_logtm.pdf, http://www.cs.wisc.edu/multifacet/papers/asplos06_nested_logtm.pdf, or the Wisconsin Multifacet home page (http://www.cs.wisc.edu/multifacet/).
MARK D. HILL (http://www.cs.wisc.edu/~markhill) is professor in both the computer sciences department and the electrical and computer engineering department at the University of Wisconsin-Madison, where he also co-leds the Wisconsin Multifacet (http://www.cs.wisc.edu/multifacet/) project with David Wood. His research interests include parallel computer system design, memory system design, and computer simulation. He earned a PhD from University of California, Berkeley. He is an ACM Fellow and a Fellow of the IEEE.
Univeristy of Pisa
Way Adaptable D-Nuca Cache
Power consumption is an increasing pressing problem in the design of high performance CPUs; Leakage or Static power will rapidly become a major source of such dissipation. Different techniques to reduce static power consumption have been developed in the research environment. The Way Adaptable D-Nuca cache represents the results of our activity on applying such techniques to large, wire delay dominated, L2 caches.
In a Way Adaptable D-Nuca cache, by exploiting typical features of D-NUCA design, we dynamically activate or deactivate entire ways on the basis of the application working set, thus adapting the size of the powered-on portion of L2 cache to the needs of the running applications. This imply a reduction of L2 cache static power consumption with negligible performance degradation... and also some other little but perhaps surprising effects.
Pierfrancesco Foglia is an Assistant Professor at the Information Engineering Department of the University of Pisa. His research interest lies in Computer Architecture, including Coherence Protocols, Cache memories, and Operating System effects. His other research interests include Computer Networks and Computer Systems Usability. He has developed for Siemens ICN a manager for a network of GSM devices and, in the framework of the EU SPP project, he defined coherence solutions for a cartographic multiprocessor system. He received his MS and Ph.D. from the University of Pisa.
IBM T.J. Watson Research Center
Optimizing system architecture with a holistic design approach in the Blue Gene Supercomputer
Technology has been a main performance driver of many system generations, leveraging CMOS scaling to increase clock speed and build increasingly complex microarchitectures. As technology-driven performance gains becomes increasingly harder to achieve from device scaling alone, innovative system architecture must take its place.
We will discuss how technology has matured, and its impact on microprocessor and system architecture. Increasingly, to optimize performance for a system, a holistic approach optimizing across the entire hardware and software stack must be considered to optimize for a range of metrics: performance, power, power/performance, reliability and ease of use.
We will describe how this integrated design approach helped shape the Blue Gene/L supercomputer. Blue Gene was designed from the ground up with a focus on power/performance efficiency and reliability. The ultimate goal was to achieve extreme scalability and high application performance under the power and thermal constraints of existing data centers. To ensure optimal system operation, Blue Gene/L is an integrated solution combining innovative system software, tools, architecture, system design, and packaging at all levels.
Valentina Salapura is a Research Staff Member with the IBM T.J. Watson Research Center. Dr. Salapura has been a technical leader for the Blue Gene program since its inception. She has contributed to the architecture and implementation of several generations of Blue Gene Systems focusing on multiprocessor interconnect and synchronization and multithreaded architecture design and evaluation. Before joining IBM, Dr. Salapura was Assistant Professor with the Dept of Computer Engineering at Technische Universitšt Wien. She is the co-author of a submission which is currently a finalist for the 2006 Gordon Bell Award, the author of over 60 papers on processor architecture and high-performance computing, and holds many patents in this area. She received the Ph.D. degree from Technische Universitšt Wien, Vienna, Austria, and MS degrees in Electrical Engineering and Computer Science from University of Zagreb, Croatia.
Future CPU Architectures: the Shift from Traditional Models
While Moore's law is alive and well in silicon scaling technology, it is clear that microprocessors have encountered significant technical issues that will influence the overall direction of the future architectures. This talk discusses the recent history of Intel microprocessors, some of the rational that guided the development of those processors. Further, the talk highlights why the future microprocessor architectures will likely look different from the past.
The traditional microprocessor architecture uses hardware techniques such as out-of-order processing to extract higher performance out of applications that have little or no explicit parallelism. The hardware techniques employed in the past have continued to improve performance, but at the cost of significantly increasing the power consumption of the traditional microprocessors. The power increases have led to not only higher electrical power delivery costs, but higher costs dissipating the power, resulting in high ambient noise, larger enclosure and hotter laps. To avoid a future that requires asbestos based jeans to properly handle laptops, the microprocessor architecture must change to facilitate higher performance without significantly higher power.
It is likely that microprocessor architecture will evolve from the ubiquitous single core, single threaded machine that we know and love, to an architecture that employs more cores and more threads. This shift is apparent in today's market where general purpose processors have included techniques such as Hyper-Threading Technology and Multi-Core processors. This talk will speculate on some potential next steps for that technology and some of the potential implications on software development.
Doug Carmean is a Principal Architect with Intel's Desktop Products Group in Oregon. Doug was one of the key architects, responsible for definition of the Intel Pentium 4 processor. He has been with Intel for 13 years, working on IA-32 processors from the 80486 to the Intel Pentium 4 processor and beyond. Prior to joining Intel, Doug worked at ROSS Technology, Sun Microsystems, Cypress Semiconductor and Lattice Semiconductor. Doug enjoys fast cars and scary, Italian motorcycles.
Carnegie Mellon University
Fingerprinting: an ingredient in building reliable microprocessors
Many aspects inherent to continued deep-submicron scaling collude to impair the reliability of future microprocessor implementations. This talk develops the idea of "fingerprinting" as an important ingredient for efficient error detection. A fingerprint is a hashed signature of internal state changes of a digital system. For example when applied at the architectural level, one may compute the fingerprint of the register file and/or cache updates. For the purpose of detecting differences in the mirrored operation of two processors, comparing their fingerprints for agreement is nearly as effective as the daunting alternative of comparing instantaneously all their internal states. We present two applications of fingerprinting. The first employs architectural fingerprinting to support dual-modular-redundant execution in a multi-core processor. Fingerprinting and other techniques combine to enable two mirrored cores to maintain redundant execution and checking without requiring them to be microarchitecturally deterministic or to be in precise locked-step. The second work applies microarchitectural-level fingerprinting to extremely-high-coverage detection of transient failures in the datapath that would normally be masked and gone unnoticed at the architectural level. This capability is central to our approach to preemptively detect the on-set of transistor wear-out failures. This talk presents joint work with Prof Babak Falsafi in the TRUSS project (http://www.ece.cmu.edu/~truss/) at the Computer Architecture Lab at Carnegie Mellon (CALCM).
James C. Hoe is an Associate Professor of Electrical and Computer Engineering at Carnegie Mellon University. His research interests include many aspects of computer architecture and digital hardware design. His current research develops architecture and microarchitecture solutions to improve computer reliability. His is also working on a hardware synthesis tool that compiles formal mathematical specification of linear DSP transforms to hardware implementations. He received the B.S. degree in electrical engineering and computer science from University of California at Berkeley in 1992 and the M.S. and Ph.D. degrees in electrical engineering and computer science from Massachusetts Institute of Technology in 1994 and 2000, respectively. For more information, please visit http://www.ece.cmu.edu/~jhoe.