Supertree Estimation Datasets


This page contains datasets that may be useful for testing supertree methods, which combine a set of phylogenies into a single estimated phylogeny on the intersection of the taxa appearing in the input set [16]. In addition to sets of supertree inputs, the simulated datasets contain model (true) trees that can be used to assess supertrees' accuracy. Assessing accuracy on empirical datasets, however, is somewhat problematic [2]. If you would like to contribute benchmarks to this resource, please email tandy@cs.utexas.edu.

Simulated Data

SMIDGen: SMIDGen is a methodology for generating simulated datasets suitable as inputs to supertree methods. The data generation process mirrors data collection processes used by systematists when gathering empirical data, including creation of densely-sampled clade-based trees as well as sparsely-sampled scaffold trees.
Described in [1]
Studied in [2]

Empirical Data

Comprehensive papilionoid legumes: CPL.tar.gz
2228 taxa, 39 source trees estimated using RAxML [15]
Described in [4]
Studied in [4]

Marsupials: marsupials.tar.gz
267 taxa, 158 source trees
Described in [5]
Studied in [5, 9, 10, 11, 12, 13, 14]

Placental mammals: placental mammals.tar.gz
116 taxa, 726 source trees
Described in [6]
Studied in [6, 9]

Seabirds: seabirds.tar.gz
121 taxa, 7 source trees
Described in [7]
Studied in [7, 9]

Temperate herbaceous papilionoid legumes: THPL.tar.gz
558 taxa, 19 source trees
Described in [8]
Studied in [8, 9, 10, 12]



