Tree Set Visualization Project

Evolutionary trees, or phylogenies, are an essential tool in biology, used in all kinds of processes such as understanding evolution, designing new drugs, predicting gene expression, and determining the origin of a virus strain. Often, for one reason or another, scientists get a large set of possible phylogenies, and they would like to understand the structure of the set and the relationships between the various possible evolutionary trees. We are writing computer software for this task.


This research project is supported by a grant from the National Science Foundation, NSF-ITR 0121651/0121682: "Collaborative Research: Exploring the Tree of Life." 



Overview

Large sets of possible evolutionary trees arise in different ways. The most popular optimization methods for reconstructing evolutionary trees are intractable. As a result, phylogenetic analysis usually produce a large set of `best' possible trees found during the search, not a single optimal tree; the biologist then needs to understand and summarize this set of trees. Another situation in which it is important to understand a large set of trees is in tracing the search process followed by a program performing phylogenetic analysis. Many of these programs proceed by `walking' from tree to tree, looking for the trees that best explain the input data. Understanding such a search means understanding a big set of trees. We are designing visualization tools to efficiently view and analyze thousands of phylogenetic trees at a time. This is a collaborative project, involving biologists and computer scientists at University of Texas-Austin, Lehman College-City University of New York, and collaborators at Hewlett-Packard's Systems Research Center.

Software

Jeff Klingner's software page, with download links.

This software runs as part of Mesquite, a modular software package for evolutionary analysis developed by Wayne and David Maddison.

 
treecomp screenshot Screenshot: The software allows you to view sets of evolutionary trees. On the left, each point represents a tree, and the distances between points reflect the Robinson-Foulds distances between trees as well as possible (using multi-dimensional scaling). The right shows the consensus tree of the highlighted cluster of trees (the consensus tree contains the edges that occur in every tree of the cluster).
 
 


People


UT, january 2002 >From left to right, Denise Edwards (CUNY), Silvio Neris (CUNY), Prof. Katherine St. John (CUNY), Prof. Nina Amenta (UT), Jeff Klingner (UT), and Fred Clarke (CUNY).

Staff

Faculty

Students


Papers

Here is a draft of our conference paper which will appear later this year.

Nina Amenta and Jeff Klingner. Case Study: Visualizing Sets of Evolutionary Trees, 8th IEEE Symposium on Information Visualization (InfoVIs 2002). Preliminary version, final version to appear in the conference in October.

Here is the paper Jeff wrote for his undergraduate research thesis at the University of Texas. It describes the basic problems we face, our use of MDS to address them, and the software project he implemented.

Klingner, Jeff. 2001. Visualizing Sets of Evolutionary Trees. The University of Texas at Austin, Department of Computer Sciences. Technical Report CS-TR-01-26. 19 pages.

In addition to visualization, we have also been working on automatic clustering of phylgenetic trees.

Cara Stockham, Li-San Wang, and Tandy Warnow. Statistically Based Postprocessing of Phylogenetic Analysis by Clustering To appear, 10th International Conference on Intelligent Systems and Molecular Biology (ISMB'02), August 2002.

Links

We've made various other Web pages for this project as we went along.

(This page was last modified on Friday, June 28, 2002)