The most frequently used techniques for reconstructing trees from biomolecular sequence data attempt to solve essentially intractable problems. The result of such a phylogenetic analysis is usually not a single optimal tree, but rather the set of all the ``best'' trees found during the search. We are interested in using computer data visualization to help explain the biological meaning of these results.
Here is an example using a small set of the ``best'' trees on several sunflower taxa, from an analysis performed by Prof. Bob Jansen.
![]() |
![]() |
Each point represents a tree; we will explain the colors in a moment. On the left, we computed (using MDS, a standard technique) an embedding of the set of tree-points into the plane so that the distances between points respect, as much as possible, the distances between trees (using a combinatorial measure, the Robinson-Foulds distance). The plot on the right compares the distance in the display on the left with the true distance. If the data admitted a perfect embedding, the plot would form a straight line; in this case there is some distortion.
![]() |
![]() |
Embedding the points in 3D lets us reduce the distortion somewhat. Interactively rotating the data in 3D gives a good feel for the shape of the data; on the left we show one view in which the points fall naturally into four clusters.
One way to summarize the ``evolutionary story'' told by the data set is with a strict consensus tree, which includes the edges agreed upon by all the trees. Here is the strict consensus tree for the whole data set.
Using strict consensus tree is satisfactory when the data presents one clear common evolutionary history, but not when it contains trees that are significantly different. This occurs for a variety of reasons, including actual differences in evolutionary histories between genes, hybridization and recombination events leading to nontree-like evolution, or simply inadequate data and/or computational time. In these cases, the consensus tree fails to be well-resolved (that is, it contains many nodes of high degree), and hence provides little information.
One way of separating out these conflicting evolutionary stories is to break the set of trees down into clusters. An accepted clustering technique is to separate the set of ``best'' trees into phylogenetic islands, defined by the neighborhoods in the search process used to find them. In this example, one island is red and the other is yellow. Here are the consensus trees for the two islands:
![]() |
![]() |
In our visualizations, three distinct clusters are visible in the red island. These clusters each
give a more specific version of the possible evolutionary history.
![]() |
![]() |
![]() |
For more ideas about how to apply visualization in phylogenetics, see our Web page.
Jeff Klingner (UT undergrad, double major in Biology and CS)
and
Prof. Nina Amenta and Prof. Tandy Warnow (UT CS)
December 2000