Director: Simon S. Lam (more publications)
Resilient Routing Fabric for Peer-to-Peer Networks
Simon S. Lam and Huaiyu Liu
Technical Report TR-03-13, May 16, 2003 (revised, October 12, 2003).
Several proposed peer-to-peer networks use hypercube routing for scalability. In
a previous paper, we showed that consistency of neighbor tables in hypercube
routing guarantees the existence of a path from any source node to any
destination node. Consistency, however, can be broken by the failure of one
node. To improve the robustness of hypercube routing, we generalize the concept
of consistency to K-consistency, for K ≥ 1.
We then show that a K-consistent hypercube routing network provides at
least K disjoint paths from any source node to any destination node with
a probability close to 1. The first objective of this report is the design and
specification of a new join protocol together with a proof that it generates
K-consistent neighbor tables for an arbitrary number of concurrent joins
(under the assumption that there is no concurrent leave or failure). To do so,
we construct a more general definition of C-set tree than our previous
one as the conceptual foundation
for protocol design and reasoning about K-consistency. Both the new
protocol and proof require major extensions to the ones in our previous paper to
generalize them from 1-consistency to K-consistency.
The second objective of this report is the design and evaluation of a failure recovery protocol for K-consistent networks. From simulation experiments in which up to 50% of the nodes in a K-consistent network failed (when a node fails, it becomes silent), we found that, for K ≥ 2, K-consistency was recovered in every experiment. The third objective of this report is to extend our join and failure recovery protocols such that they construct and maintain K-consistent neighbor tables for networks whose nodes join and fail concurrently and frequently. In particular, our join protocol is extended with rules to handle failures of not only existing nodes but also other joining nodes. These extended protocols, being implemented in our prototype system named Silk, will be referred to as Silk protocols. From simulation experiments in which the number of concurrent joins and failures was up to 50% of the initial network size, we found that, for K ≥ 2, Silk generated and maintained K-consistent neighbor tables after the concurrent joins and failures in every experiment. We also present an analysis of the communication and storage overheads of Silk protocols and show that Silk is scalable to a large number of network nodes