Phylo Lab
The University of Texas at Austin


1.
The analytical study of convergence rates of different methods, and the development of provably fast-converging methods (See this page for our work in this area)

Absolute fast converging phylogenetic reconstruction methods are provably guaranteed to recover the true tree with high probability from sequences that grow only polynomially in the number of leaves, once the edge lengths are bounded arbitrarily from above and below. Only a few methods have been determined to be abso-lute fast converging; these have all been developed in just the last few years, and most are polynomial time. In this paper, we compare pre-existing fast converging methods as well as some new polynomial time methods that we have developed. Our study, based upon simulating evolution under a wide range of model conditions, establishes that our new methods outperform both neighbor joining and the previous fast converging methods, returning very accurate large trees, when these other methods do poorly.

2.
The inference of phylogenetic trees from gene order and content data (See this page for our work in this area)

The genomes of some organisms have a single chromosome or contain single-chromosome organelles (such as mitochondria or chloroplasts) whose evolution is largely independent of the evolution of the nuclear genome for these organisms. Many single-chromosome organisms and organelles have circular chromosomes. Given a particular strand from a single chromosome, whether linear or circular, we can infer the ordering of the genes, along with directionality of the genes, thus representing each chromosome by an ordering (linear or circular) of signed genes. Note that picking the complementary strand produces a different ordering, in which the genes appear in the reverse direction and reverse order. The evolutionary process that operates on the chromosome can thus be seen as a transformation of signed orderings of genes.


3.
Visualization and clustering of sets of phylogenetic trees (See this page by Prof. Nina Amenta for this work)

Phylogenies (i.e. evolutionary trees) are fundamental to our understanding of evolution, and their construction is a major part of research in many areas of biology. With the production of increasing amounts of biomolecular sequence data, we are reaching a moment where the bottleneck in phylogenetics is not the quantity of data, but its analysis.

The most frequently used techniques for reconstructing trees from biomolecular sequence data attempt to solve essentially intractable problems. The result of such a phylogenetic analysis is usually not a single optimal tree, but rather the set of all the ``best'' trees found during the search. We are interested in using computer data visualization to help explain the biological meaning of these results.


4.
Computational phylogenetics in historical linguistics (See this page for our early research in historical linguistics)

The Computational Historical Linguistics Project is developing new methodologies for determining the evolutionary history of sets of related languages. Our methodology combines a careful implementation of the comparative method of Historical Linguistics with recent advances in tree-construction algorithms. The combination allows us to preserve the advantages of tradional subgrouping methods while transcending their limitations. The algorithm we use returns not only the optimal evolutionary tree (or trees) for the data presented, but all trees that are close to optimal. In this way, it is possible to determine which aspects of the evolutionary history are well-supported (because they appear in all the almost-optimal trees), and more generally to quantify the support for different evolutionary hypotheses.

To date, we have extensively tested a set of twelve Indo-European languages, and have made several surprising and well-supported findings. We are now extending the study to other language families, including a set of Algonquian languages, and a set of Dravidian languages.

It is the cooperation, joint work and expertise of Tandy Warnow, who specializes in tree-construction algorithms, a nd Don Ringe, whose research is in Historical Linguistics, which makes this project possible.

Currently, we are working on extending the work of Warnow and Ringe in which the concept of Perfect Phylogeny was applied to the reconstruction of evolutionary histories of languages. The previous work of Warnow and Ringe targeted "trees of relationships", and we are trying to develop the necessary methodology and algorithm for reconstructing the evolutionary histories of languages when such histories are non-treelike (due to various reasons).

5.
Phylogenetic Networks (See this page for work on phylogenetic networks)

Phylogenies, i.e., the evolutionary histories of groups of organisms, play a major role in representing the interrelationships among biological entities. Many methods for reconstructing such phylogenies have been proposed, but almost all of them assume that the underlying evolutionary history of a given set of species can be represented by a tree. While this model gives a satisfactory first-order approximation for many families of organisms, other families exhibit evolutionary mechanisms that cannot be represented by a tree. Processes such as hybridization and horizontal gene transfer result in networks of relationships rather than trees of relationships. Although this problem is widely appreciated, there has been comparatively little work on computational methods for estimating evolutionary networks.

Currently, we are working on developing the methodologies, algorithms, and tools reconstructing phylogenetinc networks in the presence of hybridization and horizontal gene transfer. We are also developing simulations tools and distance metrics for phylogenenetic networks.

6.
Post-processing sets of phylogenetic trees (See this page for work in this area)

For a variety of reasons, a typical outcome of a phylogenetic analysis of a dataset can consist of many different unrooted trees, and each tree represents an equally believable estimate of the true tree. Making sense of the set of these trees is then a challenging prospect.

There are two basic approaches that have been used for this problem. The first approach (and the most popular) represents the set of trees by a single tree on the full dataset (i.e. the ``consensus tree"). Consensus tree techniques such as ``strict consensus" and ``majority consensus" are the most popular, and have the advantage that they are polynomial time. The other approach is to restrict the trees to a (large) subset of the taxa such that all the trees either agree on this subset (Maximum Agreement Subset/Subtree) or all the trees share a common refinement (Maximum Compatible Subset/Subtree) when restricted to this subset.

We work on developing efficient approximation and fixed-parameter tractable algorithms for the Maximum Compatible Tree problem and in general for the problem of post-processing the outcome of a phylogenetic analysis so as to extract meaningful information common to all the trees returned.

This free script provided by
JavaScript Kit