PhyloLab Complex Evolution Page

The project is a collaboration between the University of Texas at Austin (Tandy Warnow, PI) and the University of New Mexico (Bernard Moret, PI). The project focuses on three main problems: the detection and inference of reticulate evolution, the inference of evolution from genomic data (e.g., gene order and content data), and phylogeny reconstruction of very large phylogenies.

Faculty

Tandy Warnow, Department of Computer Sciences, UT-Austin

Robert Jansen, Section of Integrative Biology, School of Biological Sciences, UT-Austin

Bernard Moret, Department of Computer Science, University of New Mexico, Albuquerque

Randy Linder, Section of Integrative Biology, School of Biological Sciences, UT-Austin

David Hillis, Section of Integrative Biology, School of Biological Sciences, UT-Austin

Software

GRAPPA: Genome Rearrangement Analysis under Parsimony and other Phylogenetic Algorithms: Click here to download the software

Research Topics

Phylogenetic Networks. Phylogenies, i.e., the evolutionary histories of groups of organisms, play a major role in representing the interrelationships among biological entities. Many methods for reconstructing such phylogenies have been proposed, but almost all of them assume that the underlying evolutionary history of a given set of species can be represented by a tree. While this model gives a satisfactory first-order approximation for many families of organisms, other families exhibit evolutionary mechanisms that cannot be represented by a tree. Processes such as hybridization and horizontal gene transfer result in networks of relationships rather than trees of relationships. Although this problem is widely appreciated, there has been comparatively little work on computational methods for estimating evolutionary networks. Currently, we are working on developing the methodologies, algorithms, and tools reconstructing phylogenetinc networks in the presence of hybridization and horizontal gene transfer. We are also developing simulations tools and distance metrics for phylogenenetic networks.

Solving hard optimization problems on huge datasets. Current approaches for phylogenetic reconstruction generally attempt to solve hard optimization problems such as maximum parsimony and maximum likelihood. Current techniques do not seem able to provide good analyses on datasets containing thousands of sequences in reasonable time periods. Finding new approaches which can enable new techniques to scale to datasets containing tens of thousands of sequences is the focus of this part of the grant. Our current techniques employ divide-and-conquer strategies to work with existing "base methods" and are able to speed up standard software by at least one order of magnitude.

The analytical study of convergence rates of different methods, and the development of provably fast-converging methods Absolute fast converging phylogenetic reconstruction methods are provably guaranteed to recover the true tree with high probability from sequences that grow only polynomially in the number of leaves, once the edge lengths are bounded arbitrarily from above and below. Only a few methods have been determined to be absolute fast converging; these have all been developed in just the last few years, and most are polynomial time. Our new methods outperform both neighbor joining and the previous fast converging methods, returning very accurate large trees, when these other methods do poorly.

The inference of phylogenetic trees from gene order and content data. The genomes of some organisms have a single chromosome or contain single-chromosome organelles (such as mitochondria or chloroplasts) whose evolution is largely independent of the evolution of the nuclear genome for these organisms. Evolutionary processes, such as inversions and transpositions, scramble the gene order without changing the gene content; other processes, such as duplications, insertions, and deletions, change the gene content as well. Because these events happen less frequently than site substitutions, it is possible to infer deep evolutionary histories with greater accuracy from gene order and content data. This project is developing software for performing whole genome phylogenetic reconstructions, for single chromosome genomes.