PhyloLab Complex Evolution Page
The project is a collaboration between the University of
Texas at Austin (Tandy Warnow, PI) and the University of
New Mexico (Bernard Moret, PI).
The project focuses on three main
problems: the detection and inference of reticulate evolution,
the inference of evolution from genomic data (e.g., gene order
and content data), and phylogeny
reconstruction of very large phylogenies.
Faculty
Tandy Warnow,
Department of Computer Sciences, UT-Austin
Robert Jansen,
Section of Integrative Biology, School of Biological Sciences, UT-Austin
Bernard Moret, Department of
Computer Science, University of New Mexico, Albuquerque
Randy Linder,
Section of Integrative Biology, School of Biological Sciences, UT-Austin
David Hillis,
Section of Integrative Biology, School of Biological Sciences, UT-Austin
Software
Research Topics
Phylogenetic Networks.
Phylogenies, i.e., the evolutionary histories of groups of organisms,
play a major role in representing the interrelationships among biological entities.
Many methods for reconstructing such phylogenies have been proposed, but
almost all of them assume that the underlying evolutionary history of a
given set of species can be represented by a tree. While this model gives
a satisfactory first-order approximation for many families of organisms, other
families exhibit evolutionary mechanisms that cannot be represented by a
tree. Processes such as hybridization and horizontal gene transfer result
in networks of relationships rather than trees of relationships. Although this
problem is widely appreciated, there has been comparatively little work on
computational methods for estimating evolutionary networks.
Currently, we are working on developing the methodologies, algorithms, and
tools reconstructing phylogenetinc networks in the presence of
hybridization and horizontal gene transfer. We are also
developing simulations tools and distance metrics for
phylogenenetic networks.
Solving hard optimization problems on huge datasets.
Current approaches for phylogenetic reconstruction generally attempt to solve hard
optimization problems such as maximum parsimony and maximum likelihood.
Current techniques do not seem able to provide good analyses on
datasets containing thousands of sequences in reasonable time periods.
Finding new approaches which can enable new techniques to scale
to datasets containing tens of thousands of sequences is the focus of
this part of the grant. Our current techniques
employ divide-and-conquer strategies to work with existing "base methods"
and are able to speed up standard software by at least one order
of magnitude.
The analytical study of convergence rates of different methods,
and the development of provably fast-converging methods
Absolute fast converging phylogenetic reconstruction methods are provably
guaranteed to recover the true tree with high probability
from sequences that grow only polynomially in the number of
leaves, once the edge lengths are bounded arbitrarily from above and
below. Only a few methods have been determined to be absolute
fast converging; these have all been developed in just the
last few years, and most are polynomial time. Our new methods
outperform both neighbor joining and the previous
fast converging methods, returning very accurate large trees,
when these other methods do poorly.
The inference of phylogenetic trees from gene
order and content data.
The genomes of some organisms have a single chromosome or contain
single-chromosome organelles (such as mitochondria or chloroplasts) whose
evolution is largely independent of the evolution of the
nuclear genome for these organisms. Evolutionary processes, such as
inversions and transpositions, scramble the gene order
without changing the gene content; other processes, such as duplications,
insertions, and deletions, change the gene content
as well. Because these events happen less frequently than site substitutions,
it is possible to infer deep evolutionary histories with
greater accuracy from gene order and content data.
This project is developing software for performing whole
genome phylogenetic reconstructions, for single chromosome genomes.