2001 Texas ARP/ATP Awards
LASR Logo To UT Home To CS Dept
spacer
LASR
People
Research

2001 Texas Advanced Research/Technology Program Awards

St of Texas The Texas Higher Education Coordinating Board announced the 2001 Advanced Research/Technology Program (ARP/ATP) awards on October 25, 2001. Funding was awarded to 402 of the 3,100 project proposals covering 21 research fields.

Three LASR projects are among the funded proposals. The project periods are 01/01/2002 through 12/31/2003.

  1. Lorenzo Alvisi (with Calvin Lin), Scalable Low-Overhead Fault-Tolerance, $147,000 (ARP).

    This project will investigate techniques for providing fault-tolerance to some of the world's fastest computers, namely, the ASCI computer clusters found at Los Alamos, Sandia, and Livermore National Labs. Our approach will concentrate on rollback recovery techniques, which require minimal dedicated resources while imposing little performance degradation. These techniques have received considerable attention in the literature for their nice theoretical properties, but have failed to provide real fault-tolerance solutions for real systems. Our project instead aims to improve rollback recovery techniques by applying them to a very real and challenging problem, one that the scientific community is desperate to solve. At the same time, we aim to shed new light on the fundamental properties of the various rollback recovery protocols by stress-testing them against supercomputer systems that are at least two orders of magnitude larger than any system that has ever been used to study such protocols.

    The main outcome of this research will be a prototype toolkit that will provide low-overhead fault-tolerance for scientific applications running on ASCI clusters. We expect that the insights that we gain in developing the toolkit will lead to novel fault-tolerance techniques and algorithms that will be both theoretically and experimentally sound.

  2. Lorenzo Alvisi and Harrick Vin, Resource Management in Server Clusters, $150,000 (ATP).

    This project will investigate techniques for building highly scalable server clusters capable of co-hosting efficiently a large number of services simultaneously. There are two aspects to this problem. First, cluster resources must be allocated to services based on their current demand. Second, the load across servers must be balanced such that the performance of each service scales linearly with the cluster resources allocated to the service. We address both problems. The outcome of this research will be new algorithms, architectures, and prototype implementations of highly scalable server clusters.

  3. Mike Dahlin, Resource Management for Safe Deployment of Edge Services, $125,000 (ATP). Collaborative project with Dan Wallach, Rice University (also awarded $125,000).

    We propose to examine how to safely support "dynamic edge services" in wide area networks (WANs) by limiting the resources they consume.

    Edge services have recently been popularized by companies such as Akamai and Digital Island, which place caching servers throughout the Internet and direct Web requests to servers close to users. These systems provide high availability and improved performance relative to traditional Web servers. However, these systems only distribute "static" content, typically GIF or JPEG images to be embedded inside Web pages. In order to support dynamic content generation, edge services face the more difficult problem of managing the execution of arbitrary computer programs. These programs may be buggy, they may consume excessive resources, or they may even be hostile to one another.

    We propose to design and prototype a system that can efficiently allocate resources across these programs to maximize system throughput and to provide worst-case service guarantees. Our system will be robust against denial-of-service attacks from malicious or buggy programs and will scale to support thousands of concurrently executing programs.

To top of page