Datasets for "Fast and accurate methods for phylogenomic analyses"
J. Yang, T. Warnow, , RECOMB-CG 2011
The datasets in this collection are used to test methods for
estimating species trees from gene trees and alignments, where the
true gene trees can differ from the true species tree
due to incomplete lineage sorting (ILS).
The 17-taxon datasets were provided by Luay Nakhleh of Rice
University, who simulated these datasets for an earlier study.
These datasets evolve under Jukes-Cantor sequence evolution.
We simulated the remaining datasets, some of which evolve
under ILS. The model of sequence evolution for these datasets is
Generalized Time Reversible (GTR) with gamma-distributed rates
across sites.
17-taxon ILS: 500 replicates, at 8 and 32 genes
file contents:
- species tree: 17-taxon/st/RepXstree - where X is the replicate number
- gene tree: 17-taxon/Yloci/RepXgtrees - where X is the replicate number and Y is the number of gene trees
- gene sequences: 17-taxon/Yloci/seq/RepXgseqs - where X is the replicate number and Y is the number of gene trees
file contents:
- species tree: model_tree - the species tree
- gene tree: 100-taxon-ILS/seq/RepX.gtY.rose.tree.t - where X is the replicate number and Y is the gene
- gene alignment: 100-taxon-ILS/seq/RepX.gtY.rose.true.aln - where X is the replicate number and Y is the gene
100-taxon nonILS: 6 model conditions, 10 replicates, 25 and 50 genes
100L2
100L3
100S2
100L2-vbr1
100L3-vbr1
100S2-vbr1
file contents (replace 100L2 with another model condition for the corresponding file):
- species tree: 100-taxon-nonILS/rose/100L2/rose.internal.model.100L2.reference_tree
- gene tree: 100-taxon-nonILS/rose/100L2/X/rose.internal.model.100L2.Y.tree.t - where X is the replicate number and Y is the gene
- gene alignment: 100-taxon-nonILS/rose/100L2/X/aln/rose.internal.model.100L2.Y.true_aln - where X is the replicate number and Y is the gene
500-taxon nonILS: 6 model conditions, 10 replicates, 25 and 50 genes
500L5
500S3
500M3
500L5-vbr1
500S3-vbr1
500M3-vbr1
file contents (replace 500L5 with another model condition for the corresponding file):
- species tree: 500-taxon-nonILS/rose/500L5/rose.internal.model.500L5.reference_tree
- gene tree: 500-taxon-nonILS/rose/500L5/X/rose.internal.model.500L5.Y.tree.t - where X is the replicate number and Y is the gene
- gene alignment: 500-taxon-nonILS/rose/500L5/X/aln/rose.internal.model.500L5.Y.true_aln - where X is the replicate number and Y is the gene
Please email tandy AT cs.utexas.edu if you have any questions.