Datasets for "Fast and accurate methods for phylogenomic analyses"

J. Yang, T. Warnow, , RECOMB-CG 2011


The datasets in this collection are used to test methods for estimating species trees from gene trees and alignments, where the true gene trees can differ from the true species tree due to incomplete lineage sorting (ILS). The 17-taxon datasets were provided by Luay Nakhleh of Rice University, who simulated these datasets for an earlier study. These datasets evolve under Jukes-Cantor sequence evolution. We simulated the remaining datasets, some of which evolve under ILS. The model of sequence evolution for these datasets is Generalized Time Reversible (GTR) with gamma-distributed rates across sites.

17-taxon ILS: 500 replicates, at 8 and 32 genes

file contents:


100-taxon ILS: 10 replicates, 25 genes

file contents:


100-taxon nonILS: 6 model conditions, 10 replicates, 25 and 50 genes

100L2 100L3 100S2 100L2-vbr1 100L3-vbr1 100S2-vbr1

file contents (replace 100L2 with another model condition for the corresponding file):


500-taxon nonILS: 6 model conditions, 10 replicates, 25 and 50 genes

500L5 500S3 500M3 500L5-vbr1 500S3-vbr1 500M3-vbr1

file contents (replace 500L5 with another model condition for the corresponding file):


Please email tandy AT cs.utexas.edu if you have any questions.