The Control Station

John Chapin

MIT Laboratory for Computer Science

Introduction

A control station is a tightly integrated system of multiple computers designed to support the human decision-maker in managing complex time-critical situations. Examples of such situations include stock trading, managing a power grid during a storm, producing a live telecast of a major sporting event, and directing the security detail for a presidential campaign event. A control station collects data flows from multiple sources in multiple formats, analyzes, summarizes, correlates, and displays this information to the decision-maker, who then uses the control station to implement and communicate his or her decisions.

This application creates requirements for massive bandwidth, predictable latency, and flexibility that can serve as the stimulus for a broad range of research in computer systems. Research on system support for control stations also has the potential to bring out new synergies among various research areas, particularly hardware architecture, operating systems, compilers and programming languages.

A number of researchers at MIT share this vision of the control station: Anant Agarwal, Saman Amarasinghe, Arvind, John Chapin, Frans Kaashoek, Charles Leiserson, Barbara Liskov, Martin Rinard and Larry Rudolph. This group is jointly responsible for the ideas described here.

Application Requirements

The requirements of control stations are significantly different from those of the scientific, engineering, database and interactive applications that drove the design of current scientific and general-purpose computer systems. Control stations have three primary requirements.

Massive bandwidth

The control station of the future will collect tens or hundreds of video streams, audio streams, discrete time-varying data such as market trades, and continuously updated situational estimates from analysis subsystems. The control station will also manage and search large databases of historical data to support the decision-maker. The challenge in designing such a system is not how many gigaflops it can sustain but how many gigabytes per second it can move between its applications and I/O ports.

Predictable latency

The success of the decision-maker depends critically on his or her response time to situational changes. Therefore the control station must provide predictable latency between the arrival of data on an input port and its display to the decision-maker in whatever analyzed form he or she has requested. Similarly the implementation and communication of the decision-maker's actions out of the system must meet stringent latency goals. This is a significantly different design requirement from that of scientific and interactive applications, in which the system must compute a fixed result in the best possible time. For a control station, the system must compute the best possible result in a fixed amount of time. Therefore the system's performance must be predictable and the load in all hardware and software subsystems must be monitored and exposed to allow the operating system and applications to respond appropriately.

Flexibility

It is important to be able to rapidly change the information processing and automated support provided by a control station. This is true both on short time scales, for example selecting a different mode in the middle of a crisis situation, and on longer time scales, where new capabilities must be added that were not anticipated when the system was built. Unfortunately current control stations are extremely inflexible, because the only way that current system architectures can support the massive bandwidth requirements of a control station is to provide independent paths through the system hardware and software for different data streams, with each path enhanced by custom signal processing hardware and hand-coded software modules to reduce and control latency. Future control stations will require flexibility approaching that of current general-purpose systems despite massive bandwidth and predictable latency requirements.

Summary

Although the requirements to support massive bandwidth, predictable latency, and flexibility are particularly stringent for control stations, similar requirements will occur in a wide class of applications. The expected heavy use of video streams in future applications will create much higher bandwidth demands than present general-purpose systems are optimized to support, while the increasing importance of the Internet stresses the ability of servers to provide low-latency response under high load. In other words, the class of applications that are considered to be general-purpose applications will soon include most of the features of control stations. Therefore mechanisms and approaches developed to improve control stations will have benefits for a broad range of future systems.

Research Challenges

The control station is sufficiently different from current general-purpose applications that it creates the need for innovation at all levels of computer system design.

Hardware architecture

The control station is a data-centric rather than a computation-centric application. To be optimal for this application, the hardware architecture should attempt to maximize utilization of interconnect bandwidth rather than processor cycles. This suggests a general design principle in which processing is moved out into the data stream.

There are many ways to realize this general principle. Staying close to current hardware designs, the machine might support application- specific computation in I/O controllers, cache coherence controllers, memory controllers, or the memory chips themselves. The challenge compared to previous efforts in this direction is to regularize these highly heterogeneous execution environments so they are easier to program and so applications can be made portable, and to virtualize the underlying hardware resources so they can be safely shared by multiple applications. Given the high bandwidth requirements of a control station, it may also be appropriate to provide reconfigurable logic in some of these locations to supplement the embedded processors that are generally slower than the system's main processors.

More radical data-centric hardware architectures could scatter multiple processors across a point-to-point interconnect and do computation wherever data flows happen to intersect or require transformation. Compared to previous work in this direction such as systolic arrays, the novel challenge is to achieve the flexibility required by a control station.

Both more standard and more radical architectures need to expose their dynamic behavior to software better than current architectures do. The flexibility requirement of the control station prevents fully static allocation of hardware resources, so hardware bottlenecks may develop at run time. These loads must be visible to software so immediate corrective action can be taken to avoid violating latency requirements.

Operating systems The architectural changes just described will undoubtedly create significant new operating system challenges. However, the bandwidth and latency requirements of control stations will require research on operating system design even if current hardware platforms are used. The mission-critical nature of control stations also affects the operating system, since the system must provide graceful degradation in functionality rather than total failure when hardware errors occur.

The massive bandwidth requirement has two primary effects. First, it makes interconnect bandwidth a critical system resource that the operating system may need to manage explicitly. Second, massive bandwidth puts new emphasis on the existing mechanisms for zero-copy data manipulation and gang scheduling. Existing work on various aspects of zero-copy manipulation must be unified and extended to the entire operating system API. Gang scheduling is required to avoid excessive buffering requirements in application-level data pipelining. Both of these must be supported in the context of a tightly-integrated system of multiple computers rather than just within a single machine.

An important way to reduce copies and improve overall bandwidth is to give applications direct access to hardware resources. Careful design of APIs is required to make direct hardware access safe in the face of application errors and context switching. This has already been explored by several research groups in the context of low-latency message passing, but control stations will benefit from regularizing the direct hardware access interface so it can be applied to a broad range of I/O devices.

The requirement for predictable latency puts special stress on the operating system, since the flexibility requirement rules out many of the design techniques used in current real-time systems. Possible techniques for managing latency under load include OS extension mechanisms so applications can execute more frequently with lower overhead, and exporting performance models of various OS services so applications can reduce their resource usage or service requests to achieve latency goals.

Both operating systems and hardware architectures will need to be reconsidered in light of the radical change in data access behavior implied by the switch to streaming data. Current systems exploit temporal locality of data access at all levels, for example in operating system paging algorithms and hardware caches. In a control station there is still temporal locality at the shortest time span, but across longer time spans the temporal locality of data access is replaced by temporal locality of function in which the same function operates on multiple pieces of data. Therefore caches will become less effective but access patterns will become more predictable. It will be as important to take advantage of that predictability at all levels as it is to take advantage of locality at all levels in current systems.

Compilers

The control station creates three challenges for compiler research.

The first challenge is to reduce the cost of software flexibility. Flexibility requires clean interfaces between software components, while achieving high bandwidth and low latency frequently requires close coupling. Compilers can bridge this gap by combining multiple modules and eliminating the runtime costs traditionally caused by clean interfaces. This has been a goal of compiler research for some time, but becomes even more important given the performance requirements of control stations. There is an important synergy with advanced operating system architectures: to the extent that operating system extension mechanisms make portions of the operating system code visible to and specializable by the compiler, the cost of the operating system API and the number of data copies it entails can be significantly reduced without sacrificing the clean interface.

The second challenge is for the compiler to manage data flows and partition computation to minimize bandwidth requirements. System flexibility and application portability demand that this process be much more automated than it is in current systems. Novel analysis techniques and layout algorithms are required. Partitioning becomes especially critical when some of the advanced hardware mechanisms described earlier are available: the computation requested by the programmer must be divided among the main processors, embedded processors in various parts of the system, and reconfigurable logic if it is present.

The third challenge is for the compiler to participate in dynamically rebalancing the system as external inputs or user requests change. This has traditionally been the sole responsibility of the operating system, but given the bandwidth and latency requirements of a control station, there can be significant benefits from taking advantage of the compiler and its intimate knowledge of the internals of the applications.

Programming languages

To make control stations as flexible as current general-purpose systems, the programming environment must make it as easy as possible to build new applications and understand their behavior. In addition to requiring research on compilers, performance monitors, and debuggers, the computational model for control stations is sufficiently different from that of current applications that this requirement may justify novel language mechanisms. Data streams, that is, data objects whose values change over time rather than being stable until modified by the sequential flow of control, are a vital data type that may deserve first-class status. External wall clock time should be ubiquitous, for example making the expected latency of methods visible to clients of an interface. Language features that interfere with the compiler's ability to partition and pipeline the computation should be deprecated. Mechanisms for making cache behavior more predictable should be added.

Summary

There are significant synergies between the research efforts required to address these challenges at the various levels of the system. Data-centric hardware architectures require operating system and compiler support to function effiently. Operating system latency management and safe direct application access to the hardware require novel hardware support. Automated partitioning and pipelining of the application by the compiler may benefit significantly from novel language features. While there are clearly ways that each level of the system could be improved independently, designing the levels to work smoothly together will provide the quantum leap in system capability demanded by the stringent requirements of control stations.

Conclusion

The control station will become a ubiquitous computing paradigm in the next century as more and more functions of industry and government come to depend on rapid, effective response to situational data gathered throughout the organization. Control stations create requirements for massive bandwidth, predictable latency, and flexibility that cannot be satisfied by current system designs. Satisfying these requirements will require progress at all system levels with the potential for significant synergy between features at the various levels. The control station is therefore a fruitful direction for computer systems research.



Return to: Table of Contents