Scalable performance analysis and prediction techniques for terascale computing
Terascale computing is commonplace in today's HPC world. For example, at LLNL, users have a choice of five separate terascale systems. With ASCI Purple, BlueGene/L, and a 11 TF Linux cluster, LLNL will continue this trend into the foreseeable future. We have found that this unprecedented degree of parallelism exposes performance limitations in new and existing applications. However, analyzing and predicting performance at this scale is challenging for several reasons. First, performance analysis techniques must strike the appropriate balance between instrumentation resolution and overhead. Second, users and architects must gain insight from these potentially massive datasets. Third, to predict application performance, designers need efficient simulation strategies for these large, complex datasets.
To this end, we propose three new solutions for these challenges, respectively. First, we introduce a novel message sampling technique for MPI applications that reduces instrumentation overhead dramatically and allows runtime analysis of performance data. Second, we help users distill massive performance datasets by using decision tree classification, a supervised machine learning technique, to classify the performance of an application's individual communication operations. Third, to accelerate the practice of performance prediction, we use tracing to capture high-level communication and computation behaviors, and then, we use a trace-driven simulator to experiment with the architectural design space.
Experimental results from numerous applications on systems demonstrate that our new solutions can improve the process of performance analysis.
Dr. Jeffrey Vetter is a computer scientist at the Center for Applied Scientific Computing (CASC) at Lawrence Livermore National Laboratory. His research interests are in the areas of experimental software systems and architectures for high-performance computing. Jeffrey earned his Ph.D. in Computer Science from the Georgia Institute of Technology. Personal URL: http://www.llnl.gov/casc/people/vetter.
FastMATH 2GHz Matrix-Math Enhanced Embedded Processor
What can run MIPS(TM)-compatible instructions and do 64 billion operations per second? The F14M902 microprocessor from Intrinsity. This microprocessor combines a MIPS-32 compatible core with a matrix processing unit, both of which operate at 2 GHz. The architecture and microarchitecture balance a high-speed circuit approach, with high-bandwidth compute capabilities and a robust memory system. A close understanding of major connections, wire delays, critical timing paths, data motion, critical code sequences, and major compute elements were considered as part of the design. The result is a design which is capable of the aforementioned 64 GOPs and performance on useful software like FFTs that is 5-10 times that of other known C-programmable designs. The presentation will cover the design, tradeoffs used for this design, and some of the results that the microprocessor can achieve.
Mike has 20 years of experience in the design of microprocessors and microprocessor-based systems. He is most interested in melding ideas, technology and human talent into sellable products. Most recently he led the design effort of Intrinsity's first product, the FastMATH microprocessor, the first embedded microprocessor designed to run at over 2 GHz.
Also, at Intrinsity, he led a design team doing contract designs of microprocessors for other companies. Mike is currently the Project Manager for the Intrinsity 2 GHz FastMATH microprocessor. Previously, he worked at Motorola where he was part of the PowerPC and 88000 development groups.
Prior to Motorola, Mike worked at a Swiss company, Netstal Machinery, where he designed digital closed-loop control software (OS and control) and hardware for precision hydraulic systems. He began his career at Data General, where he designed graphics systems, personal computer systems, and workstations.
Mike is an Aggie, but don't hold that against him.
IBM T.J. Watson Research Center
Evolution of Processor Chip Architectures
Trends in lithography and process technology indicate that computer chips will have multiple billions of transistors before the end of the decade. Will such large numbers of transistors be used to implement dynamic learning techniques to improve uniprocessor performance? Or, will they be used to pack hundreds of functional units in some regular fashion? This talk suggests that inertia in programming models will limit changes to system architecture in the short term -- the available transistors will likely be used to bring more of the system as we know it today onto the chip. The talk will address the architectural options provided by such large densities on a chip, discuss the implications on programming and tool development, and suggest areas of research for the computer architecture community.
Dr. Ravi Nair graduated from the University of Illinois with a Ph.D. in computer science in 1978. He has worked at the IBM Thomas J. Watson Research Center since then. He has also taught at Princeton University, where he was on sabbatical, and at Columbia University. Dr. Nair's current interests include multiprocessor and uniprocessor architecture, embedded systems, and virtual machine technology. Dr. Nair is a member of the IBM Academy of Technology and a Fellow of the IEEE.
IBM T.J. Watson Research Center
The K42 research operating system
K42 is a new Linux-compatible research operating system kernel for 64-bit shared memory multiprocessors. Each virtual and physical resource, e.g., open file, memory region, page table, is managed by a separate object instance. This model provides the standard software engineering benefits (portability, maintainability, extensibility), but, more importantly: 1) allows customization on a resource by resource basis and 2) allows accesses to different resources to be efficiently handled in parallel. Individual objects can be "hot-swapped" with new implementations based on current or expected use and/or to selectively upgrade the system with bug, security, or performance fixes without bringing it down.
We will give a brief overview of K42, describing some of the key technology and the newest performance and scalability results, and we will discuss where we are going with the system. One of the fundamental goals of this project has been to develop an operating system platform that not only has performance and functionality advantages over existing systems, but which can be used as a platform to more easily study research questions and then transfer technology into commercial systems like Linux. K42 is freely available to collaborators under a GPL license. We will discuss some of the success stories in technology already being transferred to Linux and then touch on a few of the interesting areas of research we would like to explore with the system (e.g., application directed customization, scalability, fault tolerance, virtualization, real-time, security...)
Orran Krieger is the manager of the advanced operating system research team at IBM T.J. Watson Research Center. He received a BASc from the University of Ottawa in 1985, a MASc from the University of Toronto in 1989, and a PhD from the University of Toronto in 1994, all in Electrical and Computer Engineering. He was one of the main architects and developers of the Hurricane and Tornado operating systems at the University of Toronto, and was heavily involved in the architecture and development of the Hector and NUMAchine shared-memory multiprocessors. Currently, he is project lead on the K42 operating system project at IBM T.J. Watson Research Center, and an adjunct associate professor in computer science at CMU. His research interests include operating systems, file systems, and computer architecture.
Moore's Law: A Time for Reflection & Refraction Petaflops in 2009/10
This presentation discusses the technologies, both semiconductor and photonic, that are being used to build high performance computing systems. We will examine Moore's Law to see if what has been in the past is a good predictor for future systems. In particular, we will explore the possible architectures of a PetaFlop system in 2009.
Steve Wallach is currently VP of Technology at Chiaro Networks, an adviser to CenterPoint Ventures and a consultant to the US DOE ASCI program. Previously, he was co-founder of Convex Computers and their Chief Technology Officer and Senior VP of Development. After Hewlett-Packard bought Convex, Wallach became the Chief Technology Officer of the Large Systems Group. He was also a Visiting Professor at Rice University, 1998-1999. Before Convex, he managed Advanced Development for Data General. His efforts on the MV/8000 are chronicled in "The Soul of a New Machine." Wallach is a member of the National Academy of Engineering and The Presidential Information Technology Advisory Committee (PITAC). He has 33 patents.
Last modified by firstname.lastname@example.org