The University of Texas at Austin
Harrick Vin home pageComputer Sciences

Research  »  Completed Projects  »  Egida


 Project Summary

The characteristics of distributed applications that desire fault-tolerance is changing. In the past, fault-tolerance was an important requirement of mission-critical applications with the primary concerns being continous availability as well as the ability to tolerate arbitrary failures; the associated costs and the overhead imposed by fault-tolerance techniques were secondary concerns. In contrast, most of the emerging distributed applications are not necessarily mission-critical and desire fault-tolerance techniques that
  1. impose minimal overhead on failure-free execution,
  2. provide fast crash recovery from common-case failure scenarios,
  3. use few dedicated resources, and
  4. can be transparently integrated with applications.
The goal of the Egida project is to meet these requirements by developing a new framework using rollback-based recovery protocols (such as message-logging and checkpointing) for applications in which processes communicate by messages, files, or a combination of the two.


 Team Members

Jeff Napper (»), Sriram S. Rao, Phoebe Weidman, Ravi Chamarajnagar, Lorenzo Alvisi (»)

 Project Publications

Titles link to the abstract. Use pdf icons in the right margin to download the publication

A Fault-Tolerant Java Virtual Machine
J. Napper, L. Alvisi, and H.M. Vin
Proceedings, International Conference on Dependable Systems and Networks (DSN 2003), San Francisco, CA, June 2003 (to appear).

The Cost of Recovery in Message Logging Protocols
S. Rao, L. Alvisi, and H.M. Vin
IEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 2, March/April 2000, pp. 160-173.

Egida: An Extensible Toolkit for Low-overhead Fault-Tolerance
S. Rao, L. Alvisi, and H.M. Vin
Proceedings, IEEE International Conference on Fault-Tolerant Computing (FTCS), June 1999, pp. 48-55. (ps)

An Analysis of Communication-Induced Checkpointing
Lorenzo Alvisi, Elmootazbellah Elnozahy, Sriram S. Rao, Syed A. Husain, and Asanka de Mel
Proceedings, International Conference on Fault-Tolerant Computing (FTCS), June 1999, pp. 242-249. (ps)

Cost of Recovery in Message Logging Protocols
S. Rao, L. Alvisi, and H.M. Vin
Proceedings, IEEE Symposium on Reliable Distributed Systems, November 1998. (ps)

Hybrid Message Logging Protocols for Fast Recovery
S. Rao, L. Alvisi, and H.M. Vin
Proceedings, 28th Annual International Symposium on Fault Tolerant Computing (FTCS-28), June 1998. (ps)

Low-overhead Protocols for Fault-tolerant File Sharing
L. Alvisi, S. Rao, and H.M. Vin
Proceedings, 18th International Conference on Distributed Computing Systems (ICDCS), Amsterdam, May 1998. (ps)

To top of page To top of page