Building MAD Distributed Systems

The goal of this project is to develop the theory and practice required for building cooperative services that span multiple administrative domains (MADs).

MAD systems are attractive because their diffused control structure may yield services that are potentially less costly and more democratic than their more centralized counterparts. Unfortunately, they are also particularly problematic from a dependability standpoint as they challenge the traditional distinction between correct and faulty nodes.

Nodes in a MAD system can, as always, deviate from their specification because they are broken, on account of bugs, errors in software configuration, or even malicious attacks. But MAD systems add a new dimension: without a central administrator to ensure that all unbroken nodes follow faithfully their assigned protocol, nodes may deviate from their specification also because they are selfish and are intent on maximizing their own utility.

Byzantine Fault Tolerance (BFT) handles the first class of deviations well. However, the Byzantine model classifies all deviations as faults and requires a bound on the number of faults in the system; this bound is not tenable in MAD systems where all nodes may benefit from selfish behavior and be motivated to deviate from the protocol. Models based on traditional game theory only account for rational behavior and are therefore brittle: they handle the second class of selfish deviations, but may be vulnerable to arbitrary disruptions if even a single node is broken and deviates from expected rational behavior.

The challenge in developing a solid foundation for constructing MAD services is then (at least) threefold: (1) to develop a model for MAD services in which it is possible to reason and prove properties of MAD services; (2) to understand how to simplify the development of MAD services under the new model, (3) to demonstrate that MAD services developed under this model can be practical by building and deploying useful applications.

Our first steps towards addressing these issues appear in

A.S. Ayer, L. Alvisi, A. Clement, M. Dahlin, J.P. Martin, and C. Porth.
BAR Fault Tolerance for Cooperative Services.
In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP 2005),
Brighton, United Kingdom, October 2005, pp. 45-58. Award Paper.

where we introduce the BAR model, named after the initials of the three classes of nodes (Byzantine, Altruistic, and Rational) that it explicitly considers. Byzantine nodes can deviate arbitrarily from their specification, even if doing so is against their interest. Altruistic nodes follow their specification faithfully, without consideration of their self interest. Rational nodes behave selfishly and deviate from a given protocol if doing so improves their own utility. The paper illustrates the construction of a BAR peer-to-peer backup service at whose core is a BAR-tolerant asynchronous replicated state machine protocol that uses Rational and Byzantine nodes to implement the abstraction of an Altruistic node.

To further demonstrate the practicality of BAR, the following two OSDI papers:

H. Li, A. Clement, E. Wong, J. Napper, L. Alvisi and M. Dahlin
BAR Gossip
In Proceedings of the 7th Symposium on Operating System Design and Implementation (OSDI '06) ,
Seattle, WA, November 2006, pp. 191-204.
H. Li, A. Clement, M. Marchetti, E. Kapritsos, L. Robison, L. Alvisi, and M. Dahlin
FlightPath: Obedience vs. Choice in Cooperative Services
In Proceedings of the 8th Symposium on Operating System Design and Implementation (OSDI '08) ,
San Diego, CA, December 2008, pp. 355-368.

investigate the design tradeoffs involved in the design and implementation of a BAR-tolerant data streaming service.

The earlier paper uses a BAR-tolerant gossip protocol to achieve the first peer-to-peer streaming media application that ensures high, predictable throughput even when most or all of the nodes act selfishly and the remainder act maliciously or malfunction in arbitrary ways. The broader significance of this result consists in demonstrating the possibility of designing efficient BAR-tolerant versions of primitives, such as gossip, whose inner working relies heavily on randomness. Randomness, as any source of non-determinism, is hard to deal with in the BAR model because it gives opportunities for rational users to hide selfish actions in the guise of legitimate, non-deterministic behavior.

This page is being updated. Please come back soon for a discussion of our more recent results.

Relevant Publications:

E. Wong, and L. Alvisi
What's a Little Collusion Between Friends?
In Proceedings of the 32nd ACM Symposium on Principles of Distributed COmputing (PODC 2013),
Monteral, Canada, July 2013.
E. Wong, and L. Alvisi
Regret Freedom isn't Free
In Proceedings of the 15th International Conference on Distributed Systems (OPODIS '11),
Toulouse, France, October 2011.
I. Abraham, L. Alvisi, and J. Halpern
Distributed Computing meets Game Theory: combining insights from two fields
In SIGACT News, 42(2),June 2011..
E. Wong, L. Alvisi, and J. Leners
It's On Me! The Benefit of Altruism in BAR Environments
In Proceedings of the 24th International Symposium on Distributed Computing (DISC 2010),
Boston, MA, September 2010.

An extended version can be found in:
E. L. Wong and L. Alvisi
It's On Me! The Benefit of Altruism in BAR Environments
UT Austin Technical Report TR-10-08.
F. Mari, I. Melatti, I. Salvo, E. Tronci, L. Alvisi, A. Clement, and H. Li
Model Checking Coalition Nash Equilibria in MAD Distributed Systems
In Proceedings of the 11th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS 2009) ,
Lyon, France, November 2009, pp. 531-546.
H. Li, A. Clement, M. Marchetti, E. Kapritsos, L. Robison, L. Alvisi, and M. Dahlin
FlightPath: Obedience vs. Choice in Cooperative Services
In Proceedings of the 8th Symposium on Operating System Design and Implementation (OSDI '08) ,
San Diego, CA, December 2008, pp. 355-368.
F. Mari, I. Melatti, I. Salvo, E. Tronci, A. Clement, and H. Li
Model Checking Nash Equilibria in MAD Distributed Systems
In Proceedings of the 8th Conference on Formal Methods in Computer-Aided Desing (FMCAD 08),
Portland, OR, November 2008, pp. 1-8.
A. Clement, H. Li, J. Napper, J. P. Martin, L. Alvisi, and M. Dahlin
BAR Primer
In Proceedings of the International Conference on Dependable Systems and Networks (DSN 2008), DCC Symposium,
Anchorage, AK, June 2008, pp. 287-296.
H. Li, A. Clement, E. Wong,J. Napper, and M. Dahlin
BAR Gossip
In Proceedings of the 7th Symposium on Operating System Design and Implementation (OSDI '06) ,
Seattle, WA, November 2006, pp. 191-204.
A.S. Ayer, L. Alvisi, A. Clement, M. Dahlin, J.P. Martin, and C. Porth
BAR Fault Tolerance for Cooperative Services.
In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP 2005),
Brighton, United Kingdom, October 2005, pp. 45-58. Award Paper.

Faculty

Students

Edmund L. Wong

Alumni

Amitanand Ayer (Facebook)
Allen Clement (Max Planck Institute for Software Systems)
Harry Li (Facebook)
Jeff Napper (CISCO)
Jean-Philippe Martin (Microsoft Silicon Valley Research Lab)

External collaborators

Enrico Tronci, Sapienza University, Rome
Federico Mari, Sapienza University, Rome
Ivano Salvo, Sapienza University, Rome