utcs Phylogenetics and Metagenomics
Simultaneous multiple alignment and phylogeny estimation
We design new methods for simultaneous estimation of alignments and trees, capable of producing highly accurate trees
and alignments on very large datasets.
Our initial work has developed the SATe method (Liu et al, Science 2009),
which can produce highly accurate trees and alignments
for datasets with 1000 sequences in just 24 hours.
Our research establishes that markers that evolve very quickly and
seem very difficult to align can
be used to advantage in large-scale phylogenetic
We work with
Huston-Tillotson University to provide research opportunities to
their undergraduates. This research is
funded by an NSF grant under the ATOL (Assembling the
Tree of Life) program; see
our ATOL project webpage for more information.
We are working on improving methods for taxon identification of
short reads found during metagenomic analyses. Our current
work, which will appear in PSB 2012, presents a new
method for phylogenetic
placement of short reads. We call this
method SEPP, for SATe-enabled phylogenetic placement.
SEPP produces more accurate placements than the leading
methods, PaPaRa, pplacer, or EPA.
We are collaborating with
in using SEPP to improve taxon identification.
Estimating species trees from gene trees
The main goal of this project is the design of fast and scalable supertree methods, capable of producing highly
accurate trees on very large datasets (with tens of thousands of taxa).
The secondary goal is to understand the taxon sampling strategies for
assembling supertree datasets that yield the most accurate supertrees.
The outcome of this project will include distribution of usable open source
software to the research community.
We have developed a very fast method, SuperFine
Systematic Biology 2012 paper, which gives very fast
and accurate supertrees.
SuperFine is a meta-method that estimates the supertree
in two steps: first a partially resolved
tree is estimated, and then each high degree node (polytomy)
in that tree is refined using a base supertree method.
Our initial studies used MRP, based upon heuristics in PAUP*
for maximum parsimony, for this refinement step.
Improvements to SuperFine in terms of accuracy and speed
have been obtained using parallelism (see
ACM-SAC 2012 paper)
or alternative base supertree methods
for Molecular Biology 2012 paper).
This research was supported
by the NSF through a large ITR grant to the
and also through the ATOL grant for large-scale simultaneous multiple sequence
alignment and phylogeny estimation.
Fast techniques for ultra-large phylogeny estimation
We design new methods for estimating
trees from ultra-large datasets, containing upwards of 10,000 taxa.
Our early work produced the
that is part of the
CIPRES project software distribution.
Rec-I-DCM3 speeds up maximum parsimony (PAUP*) and maximum
likelihood software (RAxML) for very large datasets. Our current work
is developing a new method, DACTAL, for producing trees for ultra-large datasets without
ever requiring that a multiple sequence alignment of the entire dataset
be estimated. DACTAL is under development.
Estimating phylogenies from whole genomes
- Whole genomes evolve under many processes that change the
order and copy number of genes, as well as the number
of chromosomes. Events such as inversions,
transpositions, and inverted transpositions,
change the gene order and strandedness, while duplications,
deletions, and insertions change the number of copies of
each gene within each chromosome. Finally,
events such as fissions and fusions change the number of
chromosomes within the genome. Estimating phylogenies from
gene order and content data presents very interesting
mathematical and computational
challenges. We work with
Bernard Moret at EPFL (Switzerland)
to develop scalable methods for estimating histories
from whole genomes.
See papers 35, 36, 41, 42, 43, 46, 50, 51, 54, 56, 68, 72, and 80
from my list of papers.
Computational Historical Linguistics
- We design methods to estimate evolutionary histories
for languages, with a particular focus on Indo-European.
We also model language evolution, including "borrowing" between
languages, as a stochastic process.
This research is a collaboration with
linguist Donald Ringe at the University of
Pennsylvania, probabilist Steve Evans at UC Berkeley, and
Luay Nakhleh at Rice University. See The
Computational Phylogenetics in Historical Linguistics webpage for more
Copyright © 2009-2010 Computational Phylogenetics Lab |
ACES 3.304 |
University of Texas |
Austin, TX 78712
Site help/questions/feedback/requests: e-mail