Egida

Home
Members
Projects >
Publications
Software
Lab Services
Lab Setup
Sponsors
Call for Papers
Site Outline
Egida
Publications
Project Summary


The characteristics of distributed applications that desire fault-tolerance is changing. In the past, fault-tolerance was an important requirement of mission-critical applications with the primary concerns being continous availability as well as the ability to tolerate arbitrary failures; the associated costs and the overhead imposed by fault-tolerance techniques were secondary concerns. In contrast, most of the emerging distributed applications are not necessarily mission-critical and desire fault-tolerance techniques that (1) impose minimal overhead on failure-free execution, (2) provide fast crash recovery from common-case failure scenarios, (3) use few dedicated resources, and (4) can be transparently integrated with applications. To meet these requirements, the goal of the lightweight fault-tolerance (LiFT) project is to develop a new framework using rollback-based recovery protocols (such as message-logging and checkpointing) for applications in which processes communicate by messages, files, or a combination of the two.

Key Results:

Team Members: Phoebe Weidman, Ravishankar Chamarajnagar, Jeff Napper, Stefano Masini, Lorenzo Alvisi, Harrick M. Vin (Alumnus: Sriram S. Rao)

See Also: Lorenzo Alvisi's research web page on Lightweight Fault-Tolerance



Egida C0PE


Symphony Mercury OSng Trellis InfoWeave


07 February 2001
Site maintained by Sara Strandtman
sds@cs.utexas.edu