Research Summary
| |
My research takes a broad view of how languages and compilers can improve
system performance and programmer productivity. My projects pursue two
general directions. The first direction, which is software-driven, attempts
to simplify programming by raising the level at which we create and optimize
programs. The second direction, which is technology-driven, devises compiler
techniques for future silicon technology trends, where, for example, in 35 nm
technology the latency across a single chip is expected to be about 30
cycles.
Software-driven projects
- Broadway: Compiler techniques for optimizing software libraries
- Compiler support for improving the fault tolerance of supercomputing
clusters
- Java Layers: Language and compiler support for component-based programming
Technology-driven projects
- Branch prediction for future technologies
- Compiling for the TRIPS Processor
- Characterizing performance tradeoffs of future microprocessors
The remainder of this page describes each of these projects in more
detail.
|
 |
Domain-Specific Compiler
Optimizations:
The Broadway Compiler
| |
This project seeks to make existing and future libraries more usable, more
efficient, and more portable. Our Broadway Compiler is a C compiler that can
be configured with domain-specific information to optimize the use and
implementation of software libraries. The key to this compiler is a simple
annotation language that can express domain-specific information which cannot
be inferred through conventional static analysis. Our approach provides a
separation of concerns that shields the application programmer from the
annotations and that allows programmers to use simple, clean interfaces.
Our new optimizations can be divided into three categories:
- Domain-specific generalizations of classic optimizations
- Techniques for optimizing sequences of library calls
- Machine-specific optimizations
We are also adding support for Java class libraries, which we intend to
evaluate by optimizing the Java Swing library.
Publications:
-
An Annotation Language for Optimizing Software Libraries
with S. Guyer
Second Conference on Domain Specific Languages.
October, 1999.
pp. 39-53
-
Optimizing the Use of High Performance Software Libraries
with S. Guyer
Languages and Compilers for Parallel Computing.
S. Midkiff, J. Moreira, M. Gupta, S. Chatterjee, J. Ferrante, J. Prins, W.
Pugh, and C. Tseng eds. Springer-Verlag 2002, pp. 227-243.
-
Broadway: A Software Architecture for Scientific Computing
with S. Guyer
The Architecture of Scientific Software,
R.F. Boisvert and P.T.P. Tang, editors,
Kluwer Academic Press, 2000, pp. 175-192.
-
Customizing Software Libraries for Performance Portability
with E. Berger and S. Guyer
10th SIAM Conference on Parallel Processing for Scientific
Computing.
March, 2001.
-
Detecting Errors with Configurable Whole-Program Dataflow Analysis
with E. Berger and S. Guyer
UTCS TR-02-04,
February, 2002.
-
Client-Driven Pointer Analysis
with S. Guyer
10th Annual International Static Analysis Symposium.
June, 2003. pp. 214-236.
- Incorporating Domain-Specific
Information into the Compilation Process.
S. Guyer
PhD Thesis, Dept. of Computer Sciences, April, 2003.
-
Broadway: A Compiler for Exploiting the Domain-Specific Semantics of Software
Libraries
with S. Guyer
Proceedings of the IEEE,
Special issue on program generation, optimization, and adaptation.
93(2), 2005, pp. 342-357.
-
Error Checking with Client-Driven Pointer Analysis
with S. Guyer
Science of Computer Programming Journal,
vol 58, 2005, pp. 83--114.
-
Efficient Flow-Sensitive Interprocedural Data-flow Analysis in the Presence
of Pointers
with Teck Bok Tok and Samuel Z. Guyer
Compiler Construction.
Springer-Verlag LNCS 3923, 2006, pp. 17-31.
Personnel:
Samuel Guyer,
Walter Chang,
Ben Hardekopf,
Teck Tok
|
 |
Scalable Fault Tolerance for Supercomputing Clusters
| |
This project seeks to provide fault tolerance for the world's fastest
supercomputing clusters. The supercomputers at Los Alamos, Sandia, and
Livermore National Laboratories consist of thousands of nodes, which
collectively suffer from poor reliability. The mean-time-to-failure (MTTF)
of today's 6,000 node systems is about 1.6 hours, and the MTTF for next
year's 16,000 node ASCI Q machine is expected to be 20 minutes.
This project will customize the Egida library-- which has been implemented
underneath MPI-- for the specific needs of scientific computations. Whereas
most fault tolerance protocols make worst-case assumptions about the behavior
of the participating processes, we will use compilers to analyze the specific
behavior of the applications to determine their specific fault-tolerance
needs. This project builds on previous work on Egida by Lorenzo Alvisi and Harrick Vin.
Publications:
Personnel:
Lorenzo Alvisi,
Sung-Eun Choi,
Alison Norman,
|
 |
Component-Based Programming with Java Layers
| |
This project aims to simplify the creation and maintenance of large, complex
software, particularly families of related software versions. Java Layers
(JL) is a language that accomplishes this goal by extending Java in three
ways:
- JL lets users trivially create applications from components by
writing simple type equations.
- JL provides a simple mechanism for creating components. These
components, or layers,are a type of Java class that obey
certain restrictions on their interfaces.
- JL allows programmers to specify design rules, which provide
semantic constraints about how layers can be meaningfully composed.
We have recently finished our Java Layers compiler, and we are currently
evaluating the language by using it to build a family of Widget libraries
for diverse platforms that include a PC, PalmOS, and cell phones. A key
scientific question is whether a domain-independent language such as JL can
achieve the same level of performance as domain-specific languages, which can
hard-code domain-specific optimizations into their compilers. Thus, we are
exploring ways to allow domain-specific information to be cleanly specified
in JL, and we are exploring ways to use such information in the compilation
process.
JL is an instance of Batory's GenVoca model of software generators.
For more information on GenVoca, see Don Batory's research page.
Publications:
-
Comparing Layered Refinement and Frameworks
with R. Cardone
23rd Int'l Conference on Software Engineering.
May, 2001. pp. 285-294.
-
Using Mixins to Build Flexible Widgets
with R. Cardone, A. Brown, and S. McDirmid
1st International Conference on Aspect-Oriented Software
Development
April, 2002. pp. 76-85.
- Language and Compiler Support for
Mixin Programming.
R. Cardone
Ph.D. Thesis, Dept. of Computer Sciences. April, 2002.
-
Using Mixin Technology To Improve Modularity
with R. Cardone
Aspect-Oriented Software Development,
Mehmet Aksit, Siobhan Clarke, Tzilla Elrad, and Richard Filman, eds.
Addison-Wesley, 2003. pp. 219--242.
Project Page:
Java Layers Research Page
Personnel:
Rich Cardone,
Adam Brown
|
 |
Branch Prediction for Future Technologies
| |
This project explores ways to widen the interface between compilers and
hardware. We are currently focusing on branch prediction, with a particular
eye towards the constraints of future technologies. As latencies within a
chip become increasingly significant, there is a growing tradeoff between
branch predictor accuracy and predictor delay: Complex predictors with large
tables can achieve high accuracy, thereby increasing IPC, but large tables
lead to either multiple cycle access or to limits on clock rate, both of
which reduce IPC. We are studying various mechanisms for dealing with delay
and accuracy:
- Hierarchical predictor organizations
- Neural systems that replace two-bit counters
- Techniques that shift some of the burden of branch prediction to the
compiler, including a novel way of encoding in each branch instruction a
boolean function that remove the tables altogether
Publications:
Personnel:
Daniel Jiménez,
Heather Hanson,
Stephen W.
Keckler
|
 |
Compiling for the Adaptive TRIPS Microprocessor
| |
The vertically integrated TRIPS project is motivated by the increasing
intra-chip latencies of silicon technology, where in 35nm technology with
aggressive clock rates, it is expected to take 30 cycles to cross a chip.
Two key aspects of the TRIPS project are (1) a novel microarchitecture that
is designed to scale to 35nm technology, and (2) a highly adaptive computing
system that provides an unprecedented level of feedback, reconfiguration, and
integration between the hardware and software.
The TRIPS compiler project focuses on three issues. First, we are studying
code scheduling strategies for the TRIPS Grid Processor, which exposes the
latencies amongst a 2D grid of ALU's, and between the ALU's and the memory
system. Second, we are studying ways to create efficient libraries for this
flexible piece of hardware. The goal is to provide a number of LibOS's
(Library Operating Systems), which are each tailored for a different hardware
configuration by using the Broadway approach to customizing libraries.
Finally, we are studying ways that a compiler can create self-adaptive
programs that can dynamically negotiate for resources. We are thus
developing techniques for identifying phase behavior with respect to various
types of behavior, including memory access, communication, and CPU
utilization. We are also developing methods of estimating resource
requirements for each phase in a program.
Publications:
-
Scaling to the End of Silicon with EDGE Architectures
with D. Burger, S. Keckler, M. Dahlin, L. John, K. McKinley, C. Moore,
J. Burrill, R. McDonald, and W. Yoder.
IEEE Computer,
37(7), July, 2004, pp. 44-55.
-
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE
Architectures
with Ramadass Nagarajan, Sundeep K. Kushwaha, Doug Burger, Stephen W. Keckler,
Kathryn S. McKinley
Int'l Conference on Parallel Architectures and Compilation Techniques.
October, 2004, pp. 74-84.
Personnel:
Ben Rister,
Kathryn McKinley
Other TRIPS Personnel:
|
 |
Characterizing Performance Tradeoffs of Modern Microprocessors
| |
As memory latencies continue to grow relative to processor speeds,
microprocessors will rely on increasingly sophisticated latency-hiding
mechanisms. These new hardware features present new challenges and tradeoffs
for compilers. For example, loop fusion and array contraction, which
typically improve performance by improving locality, can decrease performance
when hardware prefetching units are provided. This project studies such
tradeoffs and develops new compilation strategies for future microprocessors,
as exemplified by the IBM Power 4. Our goals are (1) to develop cost models
that capture the effects of modern hardware, including cache effects and
sophisticated prefetching hardware, and (2) to use these models to guide new
compilation strategies.
Publications:
Personnel:
Ibrahim Hur,
Kathryn McKinley
|
 |
C-Breeze Compiler Infrastructure
| |
C-Breeze is a compiler infrastructure that's intended to be small and easy to
use, both as a teaching tool and a research tool. C-Breeze is currently a
source-to-source translator for ANSI C (ISO 9899), but we are in the process
of creating a PowerPC back-end, and we are adding support for object oriented
optimizations so we can compile Java programs (producing Java source as
output).
C-Breeze is written in C++ and is organized as a series of phases. Phases
are provided to parse source code into an AST, to dismantle the AST to a low
level intermediate representation, to build a control flow graph, and to
convert to SSA form. Various other analysis and optimization phases are
available, such as a context-sensitive interprocedural pointer analysis, PRE
using SSA form, a configurable interprocedural dataflow analysis engine, and
more mundane phases such as liveness analysis and constant propagation. New
phases can be added by using the Visitor design pattern through built-in
phases to traverse and change the AST.
We plan to release this software shortly.
See the C-Breeze Home Page for more
information.
Personnel:
|
 |
Other Projects:
|
 |