Byzantine Fault Tolerance and Beyond

This project seeks to improve system security and robustness by building distributed services that tolerate buggy, selfish, or malicious nodes. We use replication and Byzantine fault tolerant (BFT) protocols to mask incorrect behaviors.

Our main results to date have fallen into two broad categories.





Technical achievements

BAR (Byzantine Altruistic Rational) Fault Tolerance

Large fault-tolerant systems typically cross administrative boundaries, especially in the case of peer-to-peer or cooperative applications. In this case, the traditional Byzantine model is no longer appropriate and a new model is needed to take into account the fact that the participants may deviate from the protocol when acting in their self-interest. The problem is not that Byzantine failures are not general enough, but rather that they are too general: we know that no protocol can tolerate a majority of Byzantine nodes, yet in a multi-administrative-domain deployment it is in fact possible that a majority of the nodes act for their self-interest and therefore deviate from the protocol. The question we answer is: how can we get these self-interested nodes to cooperate?

We propose a principled approach to this problem. First, we introduce a model, BAR, that specifies for both Byzantine and rational failures. We use the word "rational" to mean the nodes that may deviate from the protocol if it is in their self-interest. We believe that protocols that are designed for deployment over multiple administrative domains should strive to be BAR-tolerant.

Second, we show how BAR-tolerant protocols can be built. We provide BAR-tolerant primitives, with which we show how to build a BAR-tolerant replicated state machine. This replicated state machine then greatly simplifies the design of BAR-tolerant protocols. As an illustration, we show a BAR-tolerant cooperative backup application, BAR-B.

We hope that this paper will encourage more researchers to explore the BAR model. Our community knows a lot about Byzantine fault-tolerant protocols, and a lot of real systems are designed to withstand Byzantine failures. In contrast, all we know of BAR-tolerant protocols is that there exists at least one (the one we show in the paper). We hope that real systems of the future will be designed to withstand not only a minority of Byzantine failures, but also a majority of rationally-behaving nodes.

Fast Byzantine Consensus

What is the minimum possible latency for consensus and the replicated state machine? In this paper we show that the answer is 2 and 3 communication steps, respectively. While we are not the first to look at this problem, we are the first (as far as we know) to provide an asynchronous consensus protocol for unreliable links and Byzantine failures that can complete in two communication steps (the protocol is safe in that model, but naturally only live under nicer conditions).

We show that in order to complete in two communication steps despite failures (a property we call "2-step"), there must be a minimum of 5f+1 processes to tolerate f Byzantine failures. The cost is fairly high, so we also show a variant (Parameterized FaB) that can complete in two communication steps with as few as 3f+1 processes (the optimal number of processes). The catch then is that 2-step operation is only guaranteed if no failure actually occurs. In order to tolerate f failures and complete in two communication steps despite t actual failures, Parameterized FaB requires 3f+2t+1 processors.

Dynamic Quorums

When you have a distributed system and a machine fails, you naturally need to put a new one in its place. What if you want to add machines to your total, or maybe use a whole new set of servers for your task?

Few systems allow you to change server membership, even though it is an important concern in practice. If you are using the replicated state machine approach then it is easy to change server membership. However, in some cases the state machine approach is too heavyweight. Almost all lightweight Byzantine fault-tolerant protocols use the quorums approach.

My advisor and I have determined that it is possible for a quorum-based protocol to change its servers in a lightweight manner. Previous attempts required additional servers to enable dynamism, but we have determined how to do this without adding any server on top of the minimal number that is necessary to provide Byzantine fault-tolerant storage in the first place. We have shown how to do this for a storage system, but we expect that the same technique can be used on any quorum protocol.

Separating Agreement from Execution

In this work, we revisit the traditional approach to replicated state machines (replicated state machines are used to build fault-tolerant systems). We show that the RSM should be split into two parts: agreement and execution. The first part generates a unique sequence from the requests sent to the RSM, and the second part executes the requests.

This separation allows for a better understanding of RSM, but they also have the direct practical impact of showing how one can build a RSM using only 2f+1 execution replicas (to tolerate f Byzantine faults) instead of the 3f+1 required by previous approaches, thus saving cost.

The separation also makes it easier to get more out of replicated state machines. The privacy firewall described below follows directly from the separation of agreement and execution.

Privacy Firewall

Traditionally, redundancy is used to improve either the performance or the availability of services. When the servers contain confidential information, the additional servers introduced by redundancy make it more likely that an attacker would be able to find a server with a vulnerability he knows how to exploit.

We are exploring how to use redundancy in a novel way to build "privacy firewalls" that protect the confidentiality of replicated data. In particular, we are experimenting with an intrusion-tolerant nfs that enforces the access rights to files even if some machines have been compromised.

Minimal Byzantine Storage

Hortum is a new and efficient distributed storage protocol that tolerates Byzantine failures. In other words, Hortum makes it possible to store data "on the network" in such a way that it can be recovered (automatically) even if several servers on the network are not reachable or have been hacked into (i.e. are actively trying to defeat the storage system).

In the course of this research we have submitted two papers to conferences. The corresponding publications and technical reports are available:

High-throughput Byzantine State Machine Replication

We argue for a simple change to Byzantine Fault Tolerant state machine replication libraries in order to provide high throughput. Traditional state machine replication based Byzantine fault tolerant (BFT) techniques provide high availability and security but fail to provide high throughput. This limitation stems from the fundamental assumption of generalized state machine replication techniques that all replicas execute requests sequentially in the same total order to ensure consistency across replicas. We propose a high throughput Byzantine fault tolerant architecture that uses application-specific information to identify and concurrently execute independent requests. Our architecture thus provides a general way to exploit application parallelism in order to provide high throughput without compromising correctness. Although this approach is extremely simple, it yields dramatic practical benefits. When sufficient application concurrency and hardware resources exist, CBASE, our system prototype, provides orders of magnitude improvements in throughput over BASE, a traditional BFT architecture. CBASE-FS, a Byzantine fault tolerant file system that uses CBASE, achieves twice the throughput of BASE-FS for the IOZone micro-benchmarks even in a configuration with modest available hardware parallelism.

Mike Dahlin's other research projects