ABOUT: ----- TAPER is a scalable data replication protocol that synchronizes a large collection of data across multiple geographically distributed replica locations. The complete details of the TAPER algorithm are described in the FAST '05 paper. Most of the work on TAPER was done during an internship and therefore the entire source code is not publicly available. However, a prototype implementation that I rewrote back in school is available; details below. SOURCE CODE: ----------- The source code for the TAPER phases I (HHT) and III (Sliding Block), and Bloom Filter similarity detection is attached. The TAPER phase II uses the Content-based chunking code from LBFS implementation and can be downloaded from the LBFS source code. The final phase IV uses vcdiff for differential compression which is publicly available. To implement TAPER on your system, you only need to integrate these pieces together and interface with your specific filesystem. DATASETS: -------- Most of the datasets are standard software packages available on the respective sites. The web datasets used in the evaluation are about 60 GB and are available. I don't have a server to host them but I can upload them if needed. For proprietary reasons, the AIX dataset is not publicly available. CONTACT: ------- Please send any feedback, bug reports, and code enhancements to nav@cs.utexas.edu