This page provides links to published datasets (both empirical and simulated) that have been used to test methods for tasks involved in estimating very large phylogenetic trees and alignments from molecular sequences. If you are interested in contributing to this resource, please send email to Tandy Warnow at tandy@cs.utexas.edu.

Phylogeny Estimation

datasets for estimating trees from sequence alignments


datasets for estimating alignments of molecular sequences


datasets for estimating phylogenies from sets of input trees

Incomplete Lineage Sorting

datasets for estimating phylogenies from sets of input trees where the gene trees differ from the species tree due to Incomplete Lineage Sorting

Phylogenetic Placement

datasets to testing methods that place partial sequences (for example, short metagenomic reads) into full-length alignments and trees.

Simulation Tools

software for generating simulated data

restricted access

unpublished datasets, not yet publically available


Links to external web sites are for datasets and software available through other laboratories and organizations. The respective labs and organizations are responsible for these datasets and software; please contact them if you have any problems or questions regarding their material. If you experience any problems with our datasets or software, please feel free to contact us at tandy@cs.utexas.edu.
