Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults

Allen Clement, Mirco Marchetti, Edmund L. Wong, Lorenzo Alvisi, and Mike Dahlin

Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI) 2009.

View PDF or BibTeX.

areas
Distributed Systems

abstract
This paper argues for a new approach to building Byzantine fault tolerant replication systems. We observe that although recently developed BFT state machine replication protocols are quite fast, they don’t tolerate Byzantine faults very well: a single faulty client or server is capable of rendering PBFT, Q/U, HQ, and Zyzzyva virtually unusable. In this paper, we (1) demonstrate that existing protocols are dangerously fragile, (2) define a set of principles for constructing BFT services that remain useful even when Byzantine faults occur, and (3) apply these principles to construct a new protocol, Aardvark. Aardvark can achieve peak performance within 40% of that of the best existing protocol in our tests and provide a significant fraction of that performance when up to f servers and any number of clients are faulty. We observe useful throughputs between 11706 and 38667 requests per second for a broad range of injected faults.