utcs Phylogenetics and Metagenomics
Research
· Publications · Software · People · Datasets

Simultaneous multiple alignment and phylogeny estimation

  • We design new methods for simultaneous estimation of alignments and trees, capable of producing highly accurate trees and alignments on very large datasets. Our initial work has developed the SATe method (Liu et al, Science 2009), which can produce highly accurate trees and alignments for datasets with 1000 sequences in just 24 hours. Our research establishes that markers that evolve very quickly and seem very difficult to align can be used to advantage in large-scale phylogenetic analyses. We work with Huston-Tillotson University to provide research opportunities to their undergraduates. This research is funded by an NSF grant under the ATOL (Assembling the Tree of Life) program; see our ATOL project webpage for more information.

Metagenomics

  • We are working on improving methods for taxon identification of short reads found during metagenomic analyses. Our current work, which will appear in PSB 2012, presents a new method for phylogenetic placement of short reads. We call this method SEPP, for SATe-enabled phylogenetic placement. SEPP produces more accurate placements than the leading methods, PaPaRa, pplacer, or EPA. We are collaborating with Mihai Pop in using SEPP to improve taxon identification.

Estimating species trees from gene trees

Supertree methods

  • The main goal of this project is the design of fast and scalable supertree methods, capable of producing highly accurate trees on very large datasets (with tens of thousands of taxa). The secondary goal is to understand the taxon sampling strategies for assembling supertree datasets that yield the most accurate supertrees. The outcome of this project will include distribution of usable open source software to the research community. We have developed a very fast method, SuperFine (see the Systematic Biology 2012 paper, which gives very fast and accurate supertrees. SuperFine is a meta-method that estimates the supertree in two steps: first a partially resolved tree is estimated, and then each high degree node (polytomy) in that tree is refined using a base supertree method. Our initial studies used MRP, based upon heuristics in PAUP* for maximum parsimony, for this refinement step. Improvements to SuperFine in terms of accuracy and speed have been obtained using parallelism (see ACM-SAC 2012 paper) or alternative base supertree methods (Algorithms for Molecular Biology 2012 paper). This research was supported by the NSF through a large ITR grant to the CIPRES project, and also through the ATOL grant for large-scale simultaneous multiple sequence alignment and phylogeny estimation.

Fast techniques for ultra-large phylogeny estimation

  • We design new methods for estimating trees from ultra-large datasets, containing upwards of 10,000 taxa. Our early work produced the Rec-I-DCM3 software that is part of the CIPRES project software distribution. Rec-I-DCM3 speeds up maximum parsimony (PAUP*) and maximum likelihood software (RAxML) for very large datasets. Our current work is developing a new method, DACTAL, for producing trees for ultra-large datasets without ever requiring that a multiple sequence alignment of the entire dataset be estimated. DACTAL is under development.

Estimating phylogenies from whole genomes

  • Whole genomes evolve under many processes that change the order and copy number of genes, as well as the number of chromosomes. Events such as inversions, transpositions, and inverted transpositions, change the gene order and strandedness, while duplications, deletions, and insertions change the number of copies of each gene within each chromosome. Finally, events such as fissions and fusions change the number of chromosomes within the genome. Estimating phylogenies from gene order and content data presents very interesting mathematical and computational challenges. We work with Bernard Moret at EPFL (Switzerland) to develop scalable methods for estimating histories from whole genomes. See papers 35, 36, 41, 42, 43, 46, 50, 51, 54, 56, 68, 72, and 80 from my list of papers.

Computational Historical Linguistics

  • We design methods to estimate evolutionary histories for languages, with a particular focus on Indo-European. We also model language evolution, including "borrowing" between languages, as a stochastic process. This research is a collaboration with linguist Donald Ringe at the University of Pennsylvania, probabilist Steve Evans at UC Berkeley, and Luay Nakhleh at Rice University. See The Computational Phylogenetics in Historical Linguistics webpage for more information.
Copyright © 2009-2010 Computational Phylogenetics Lab | ACES 3.304 | University of Texas | Austin, TX 78712
Site help/questions/feedback/requests: e-mail Tandy Warnow