utcs Phylogenetics
Research · Publications · Software · People ·
Datasets

Empirical Datasets with Reference Topologies

Overview

Here we present datasets that are challenging for phylogeny estimation methods. Each of these datasets is given with a reference tree. Reference trees for empirical datasets with highly reliable (curated) alignments are typically provided by running RAxML [2] with bootstrapping, and retaining only the highly supported edges. If you would like to contribute benchmarks to this resource, please email tandy@cs.utexas.edu.

Nucleic Acid Data

16S.B.ALL (27,643 sequences)

This alignment contains 27,643 ribosomal RNA sequences (each with 6,857 sites) of the 16S gene taken from bacteria.

Source: Gutell Lab CRW [1] ("Primary Alignments" section, Table 1)

Alignment Statistics
Number of Taxa: 27,643
Number of Sites: 6,857
Percent Indels: 80.0
Average Gap Length: 4.9

Reference Alignment: 16S.B.ALL.tar.bz2
Alignment method: based on secondary structure
Alignment modifications: removed sites consisting solely of indels; removed taxa consisting of 50% or more indel characters

Reference Tree: 16S.B.ALL.reference.nwk
An initial tree was estimated using RAxML [2] version 7.0.4, with support values calculated using 573 bootstrap replicates; edges with less than 75% support were then contracted.


16S.T (7,350 sequences)

This alignment contains 7,350 ribosomal RNA sequences (each with 11,856 sites) of the 16S gene taken from three phylogenetic domains.

Source: Gutell Lab CRW [1] ("Primary Alignments" section, Table 1)

Alignment Statistics
Number of Taxa: 7,350
Number of Sites: 11,856
Percent Indels: 87.4
Average Gap Length: 12.1

Reference Alignment: 16S.T.tar.bz2
Alignment method: based on secondary structure
Alignment modifications: removed sites consisting solely of indels; removed taxa consisting of 50% or more indel characters

Reference Tree: 16S.T.reference.nwk
An initial tree was estimated using RAxML [2] version 7.2.6, with support values calculated using 346 bootstrap replicates; edges with less than 75% support were then contracted.


16S.3 (6,323 sequences)

This alignment contains 6,323 ribosomal RNA sequences (each with 8,716 sites) of the 16S gene taken from three phylogenetic domains.

Source: Gutell Lab CRW [1] ("Primary Alignments" section, Table 1)

Alignment Statistics
Number of Taxa: 6,323
Number of Sites: 8,716
Percent Indels: 82.1
Average Gap Length: 9.4

Reference Alignment: 16S.3.tar.bz2
Alignment method: based on secondary structure
Alignment modifications: removed sites consisting solely of indels; removed taxa consisting of 50% or more indel characters

Reference Tree: 16S.3.reference.nwk
An initial tree was estimated using RAxML [2] version 7.2.6, with support values calculated using 500 bootstrap replicates; edges with less than 75% support were then contracted.


16S.M.aa_ag (1,028 sequences)

This alignment contains 1,028 ribosomal RNA sequences (each with 4,907 sites) of the 16S gene sampled from eucarya mitochondria. [3]

Source: Gutell Lab CRW [1] ("Alignments Used in Specific Analyses" section)

Alignment Statistics
Number of Taxa: 1,028
Number of Sites: 4,907
Percent Indels: 82.6
Average Gap Length: 22.0

Reference Alignment: 16S.M.aa_ag.tar.bz2
Alignment method: based on secondary structure
Alignment modifications: removed sites consisting solely of indels; removed taxa consisting of 50% or more indel characters

Reference Tree: 16S.M.aa_ag.reference.nwk
An initial tree was estimated using RAxML [2] version 7.0.4, with support values calculated using 500 bootstrap replicates; edges with less than 75% support were then contracted.


16S.M (901 sequences)

This alignment contains 901 ribosomal RNA sequences (each with 4,722 sites) of the 16S gene sampled from mitochondria.

Source: Gutell Lab CRW [1] ("Primary Alignments" section, Table 1)

Alignment Statistics
Number of Taxa: 901
Number of Sites: 4,722
Percent Indels: 78.1
Average Gap Length: 17.2

Reference Alignment: 16S.M.tar.bz2
Alignment method: based on secondary structure
Alignment modifications: removed sites consisting solely of indels; removed taxa consisting of 50% or more indel characters

Reference Tree: 16S.M.reference.nwk
An initial tree was estimated using RAxML [2] version 7.0.4, with support values calculated using 500 bootstrap replicates; edges with less than 75% support were then contracted.


23S.M (278 sequences)

This alignment contains 278 ribosomal RNA sequences (each with 10,738 sites) of the 23S gene sampled from mitochondria.

Source: Gutell Lab CRW [1] ("Primary Alignments" section, Table 1)

Alignment Statistics
Number of Taxa: 278
Number of Sites: 10,738
Percent Indels: 83.7
Average Gap Length: 31.9

Reference Alignment: 23S.M.tar.bz2
Alignment method: based on secondary structure
Alignment modifications: removed sites consisting solely of indels; removed taxa consisting of 50% or more indel characters

Reference Tree: 23S.M.reference.nwk
An initial tree was estimated using RAxML [2] version 7.0.4, with support values calculated using 500 bootstrap replicates; edges with less than 75% support were then contracted.


23S.M.aa_ag (263 sequences)

This alignment contains 263 ribosomal RNA sequences (each with 10,305 sites) of the 23S gene sampled from eucarya mitochondria. [3]

Source: Gutell Lab CRW [1] ("Alignments Used in Specific Analyses" section)

Alignment Statistics
Number of Taxa: 263
Number of Sites: 10,305
Percent Indels: 83.5
Average Gap Length: 34.2

Reference Alignment: 23S.M.aa_ag.tar.bz2
Alignment method: based on secondary structure
Alignment modifications: removed sites consisting solely of indels; removed taxa consisting of 50% or more indel characters

Reference Tree: 23S.M.aa_ag.reference.nwk
An initial tree was estimated using RAxML [2] version 7.0.4, with support values calculated using 500 bootstrap replicates; edges with less than 75% support were then contracted.


23S.E.aa_ag (144 sequences)

This alignment contains 144 ribosomal RNA sequences (each with 8,619 sites) of the 23S gene sampled from eucarya nuclei. [3]

Source: Gutell Lab CRW [1] ("Alignments Used in Specific Analyses" section)

Alignment Statistics
Number of Taxa: 144
Number of Sites: 8,619
Percent Indels: 61.1
Average Gap Length: 13.5

Reference Alignment: 23S.E.aa_ag.tar.bz2
Alignment method: based on secondary structure
Alignment modifications: removed sites consisting solely of indels; removed taxa consisting of 50% or more indel characters

Reference Tree: 23S.E.aa_ag.reference.nwk
An initial tree was estimated using RAxML [2] version 7.0.4, with support values calculated using 500 bootstrap replicates; edges with less than 75% support were then contracted.


23S.E (117 sequences)

This alignment contains 117 ribosomal RNA sequences (each with 9,079 sites) of the 23S gene sampled from eukaryotes.

Source: Gutell Lab CRW [1] ("Primary Alignments" section, Table 1)

Alignment Statistics
Number of Taxa: 117
Number of Sites: 9,079
Percent Indels: 59.7
Average Gap Length: 12.6

Reference Alignment: 23S.E.tar.bz2
Alignment method: based on secondary structure
Alignment modifications: removed sites consisting solely of indels; removed taxa consisting of 50% or more indel characters

Reference Tree: 23S.E.reference.nwk
An initial tree was estimated using RAxML [2] version 7.0.4, with support values calculated using 500 bootstrap replicates; edges with less than 75% support were then contracted.

References

Disclaimer

Links to external web sites are for datasets and software available through other laboratories and organizations. The respective labs and organizations are responsible for these datasets and software; please contact them if you have any problems or questions regarding their material. If you experience any problems with our datasets or software, please feel free to contact us at tandy@cs.utexas.edu.
Copyright © 2009-2010 Computational Phylogenetics Lab | ACES 3.304 | University of Texas | Austin, TX 78712
Site help/questions/feedback/requests: e-mail Tandy Warnow