Lightweight Fault-Tolerance

People Working Directory

As distributed computing becomes commonplace, and many more applications are faced with the current costs of high availability, there is a fresh need for recovery-based techniques that combine high performance during failure-free executions with fast recovery. However, although the literature contains approximately 300 papers in this area, rollback recovery is seldom used in practice to build reliable distributed applications. The Lightweight Fault-Tolerance (LiFT) project focuses on changing this state of affairs with an approach that blends algorithmic work, systems building, and empirical analysis.

Highlights

Current Focus

Publications