Issues in Compiler Research

Position Paper for NSF Workshop on
New Challenges and Directions for Systems Research
St. Louis, MO, July 31 - August 1, 1997

Siddhartha Chatterjee

Department of Computer Science
The University of North Carolina at Chapel Hill

Introduction

The rapid developments in computer and communication systems and the growth of innovative applications provides new challenges and opportunities to the research area of programming languages, compilers, and runtime systems. This research area, being at the nexus of systems research, will be critical to the development of applications that exploit high levels of abstraction and functionality and achieve high levels of performance.

I see the following as key problems in this area of research.

The following sections describe these items in greater detail.

Memory hierarchy management

The increasing disparity between CPU and memory speeds over the past two decades has resulted in the use of multi-level memory hierarchies that attempt to exploit locality of reference in programs and hide memory latencies. The performance of many of today's complex applications is determined by the characteristics of the memory subsystem rather than by the operation count of the program, and there is often a dramatic difference between the peak advertised CPU performance and the observed performance of a given application. The widening gap between processor and memory performance is likely to continue, resulting in even deeper memory hierarchies.

An integrated solution to the memory hierarchy management problem must deal with resource allocation and management issues at all levels of the hierarchy: good interprocedural allocation of machine registers, spatial and temporal locality in multiple levels of data cache, code layout in the instruction cache, virtual-to-physical address mappings in the translation lookaside buffer, page-level locality in main memory, I/O buffers in secondary storage, and local/remote memory and memory consistency issues in distributed shared memory systems. The state of practice is that users end up restructuring programs manually to improve memory performance, and that the tools available for this task are limited to specific subproblems or to subclasses of programs. The lack of automatic tools for memory hierarchy management has the obvious negative consequences on readability, portability, and maintainability of code. A typical example of this phenomenon is seen in high-performance mathematical library codes such as ScaLAPACK, where much of the complexity of a typical routine is independent of the numerical computing issues and arise from trying to do a good job of load balancing and memory management.

The state of the art in tools for this problem varies widely.

Good memory locality is a prerequisite for high performance. The fundamental challenges in this problem are twofold: Progress in this area will depend on the collaboration among several research areas of computing systems.

Performance in the face of abstraction

The growth of problem solving environments and object-oriented programming techniques means that programs are increasingly been written as collections of modules rather than as monolithic entities. Application programs are often little more than the glue that binds together calls to library routines (e.g., OpenGL in graphics programming; the Java API class libraries; and numerical libraries such as LAPACK). A similar layering exists in operating systems, although the reason there is often protection rather than abstraction. Such layering and modularity are indispensable for controlling the complexity and safety of applications, but provide a difficult challenge for delivering high performance out of such applications[2].

The following characteristics of this problem make it both challenging and profitable.

Breakthroughs in this problem would have widespread impact in diverse areas, ranging from faster processing of communication protocol stacks to speeding up the verification process for Java bytecodes.

Progress in this problem needs to be made at different levels.

There is need to develop consensus and standards within the compiler community about the kinds of capabilities that separately compiled modules must support. The benefits of standards for representing such information are clear.

Domain-specific compression

With the growing use of network computing, there is a growing need for efficient encodings of multimedia data and object code in order to conserve network resources during transmission, disk resources during storage, and processor resources during interpretation. In this section I will address a restricted form of this issue as it applies to code (such as Java class files) that a client may download from a server.

Text compression techniques can be applied naively to object code files, but more leverage can be gained by exploiting special characteristics of instruction and data streams. Ernst et al.[7] have demonstrated greater improvements by exploiting semantic information in instruction streams, but this has barely scratched the surface of what is possible in this area.

Many possible research opportunities exist here, all with high payoff.

Conclusions

Compilers and related technologies have a key role to play in the next generation of systems and applications. A good example of this is the work of Eide et al.[6], in which fairly standard compiler optimizations were applied to RPC stub generation to demonstrate impressive gains in end-to-end performance. Similar opportunities undoubtedly exist in other application areas. The compiler community needs to be driven by the problems faced by such new application domains.

Bibliography

  1. J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel, "Optimizing Matrix Multiply using PhiPAC: A Portable, High-Performance, ANSI C Coding Methodology", Proceedings of International Conference on Supercomputing, Vienna, Austria, July 1997. To appear.
  2. J. C. Brustoloni, "Effects of Buffering Semantics on I/O Performance", in Proceedings of OSDI'96, USENIX, Seattle, WA, pp. 277-291, October 1996.
  3. B. Calder, D. Grunwald, and B. Zorn, "Quantifying Behavioral Differences Between C and C++ Programs". Journal of Programming Languages 2(4), pp. 313-351, 1994.
  4. P. Cao, E. W. Felton, A. R. Karlin, and K. Li, "A Study of Integrated Prefetching and Caching Strategies", Proceedings of ACM SIGMETRICS'95, pp. 188-197, May 1995.
  5. T. H. Cormen and M. Hirschl, "Early Experiences in Evaluating the Parallel Disk Model with the ViC* Implementation". To appear in Parallel Computing. Also Dartmouth College Computer Science Technical Report PCS-TR96-293, August 1996.
  6. E. Eide, K. Frei, B. Ford, J. Lepreau, and G. Lindstrom, "Flick: A Flexible, Optimizing IDL Compiler", Proceedings of PLDI'97, pp. 44-56, June 1997.
  7. J. Ernst, W. Evans, C. W. Fraser, S. Lucco, and T. A. Proebsting, "Code Compression", Proceedings of PLDI'97, pp. 358-365, June 1997.
  8. M. Kandemir, J. Ramanujam, and A. Choudhary, "Improving the Performance of Out-of-Core Computations", Proceedings of the International Conference on Parallel Processing, Blomingdale, IL, August 1997. To appear.
  9. I. Kodukula, N. Ahmed, and K. Pingali, "Data-Centric Multi-Level Blocking", Proceedings of PLDI'97, pp. 346-357, June 1997.
  10. M. S. Lam, E. E. Rothberg, and M. E. Wolf, "The Cache Performance and Optimization of Blocked Algorithms", Proceedings of ASPLOS IV, pp. 63-74, April 1991.
  11. G. C. Necula and P. Lee, "Proof-Carrying Code", Technical Report CMU-CS-96-165, Carnegie Mellon University, September 1996.
  12. V. P. Pauca, X. Sun, S. Chatterjee, and A. R. Lebeck, "Architecture-efficient Strassen's Matrix Multiplication: A Case Study of Divide-and-Conquer Algorithms", Proceedings of the International Linear Algebra Society (ILAS) Symposium on Fast Algorithms for Control, Signals and Image Processing, Winnipeg, Manitoba, Canada, June 1997.

Siddhartha Chatterjee (sc@cs.unc.edu)
Last modified: Mon Jul 28 09:38:58 EDT 1997