TAPER:Tiered Approach for Eliminating Redundancy in Replica Synchronization
We present TAPER, a scalable data replication protocol that
synchronizes a large collection of data across multiple
geographically distributed replica locations. TAPER can be applied to
a broad range of systems, such as software distribution mirrors,
content distribution networks, backup and recovery, and federated file
systems.
TAPER is designed to be bandwidth efficient, scalable and
content-based, and it does not require prior knowledge of the replica state.
To achieve these properties, TAPER provides: i) four
pluggable redundancy elimination phases that balance the trade-off
between bandwidth savings and computation overheads, ii) a
hierarchical hash tree based directory pruning phase that quickly
matches identical data from the granularity of directory trees to
individual files, iii) a content-based similarity detection technique
using Bloom filters to identify similar files, and iv) a combination of
coarse-grained chunk matching with finer-grained block matches to
achieve bandwidth efficiency.
Through extensive experiments
on various datasets, we observe that in comparison with rsync, a
widely-used directory synchronization tool, TAPER reduces bandwidth
by 15% to 71%, performs faster matching, and scales
to a larger number of replicas.
Papers and Presentations
-
Navendu Jain, Mike Dahlin, and Renu Tewari. TAPER: Tiered Approach for Eliminating Redundancy in Replica Sychronization
. 4th USENIX Conference on File and Storage Technologies (FAST '05), December 14-16, 2005, San Francisco, CA.
[PDF]
 
[Slides]
-
Navendu Jain, Mike Dahlin, and Renu Tewari. TAPER: Tiered Approach for Eliminating Redundancy in Replica Sychronization
. Technical Report TR-05-42,
Department of Computer Sciences, University of Texas at Austin.
[PDF]
People
Source Code and Datasets
Related Links