Building MAD Distributed Systems

The goal of this project is to develop the theory and practice required for building cooperative services that span multiple administrative domains (MADs).

MAD systems are attractive because their diffused control structure may yield services that are potentially less costly and more democratic than their more centralized counterparts. Unfortunately, they are also particularly problematic from a dependability standpoint as they challenge the traditional distinction between correct and faulty nodes.

Nodes in a MAD system can, as always, deviate from their specification because they are broken, on account of bugs, errors in software configuration, or even malicious attacks. But MAD systems add a new dimension: without a central administrator to ensure that all unbroken nodes follow faithfully their assigned protocol, nodes may deviate from their specification also because they are selfish and are intent on maximizing their own utility.

Byzantine Fault Tolerance (BFT) handles the first class of deviations well. However, the Byzantine model classifies all deviations as faults and requires a bound on the number of faults in the system; this bound is not tenable in MAD systems where all nodes may benefit from selfish behavior and be motivated to deviate from the protocol. Models based on traditional game theory only account for rational behavior and are therefore brittle: they handle the second class of selfish deviations, but may be vulnerable to arbitrary disruptions if even a single node is broken and deviates from expected rational behavior.

The challenge in developing a solid foundation for constructing MAD services is then (at least) threefold: (1) to develop a model for MAD services in which it is possible to reason and prove properties of MAD services; (2) to understand how to simplify the development of MAD services under the new model, (3) to demonstrate that MAD services developed under this model can be practical by building and deploying useful applications.

Our first steps towards addressing these issues appear in

where we introduce the BAR model, named after the initials of the three classes of nodes (Byzantine, Altruistic, and Rational) that it explicitly considers. Byzantine nodes can deviate arbitrarily from their specification, even if doing so is against their interest. Altruistic nodes follow their specification faithfully, without consideration of their self interest. Rational nodes behave selfishly and deviate from a given protocol if doing so improves their own utility. The paper illustrates the construction of a BAR peer-to-peer backup service at whose core is a BAR-tolerant asynchronous replicated state machine protocol that uses Rational and Byzantine nodes to implement the abstraction of an Altruistic node.

To further demonstrate the practicality of BAR, the following two OSDI papers:

investigate the design tradeoffs involved in the design and implementation of a BAR-tolerant data streaming service.

The earlier paper uses a BAR-tolerant gossip protocol to achieve the first peer-to-peer streaming media application that ensures high, predictable throughput even when most or all of the nodes act selfishly and the remainder act maliciously or malfunction in arbitrary ways. The broader significance of this result consists in demonstrating the possibility of designing efficient BAR-tolerant versions of primitives, such as gossip, whose inner working relies heavily on randomness. Randomness, as any source of non-determinism, is hard to deal with in the BAR model because it gives opportunities for rational users to hide selfish actions in the guise of legitimate, non-deterministic behavior.

This page is being updated. Please come back soon for a discussion of our more recent results.

Relevant Publications:


Faculty

Students

Alumni

External collaborators

Sponsors

This material is based upon work supported by:

Back to Lorenzo Alvisi's Home page.