Post-processing sets of phylogenetic trees

For a variety of reasons, a typical outcome of a phylogenetic analysis of a dataset can consist of many different unrooted trees, and each tree represents an equally believable estimate of the true tree. Making sense of the set of these trees is then a challenging prospect.

There are two basic approaches that have been used for this problem. The first approach (and the most popular) represents the set of trees by a single tree on the full dataset (i.e. the ``consensus tree"). Consensus tree techniques such as ``strict consensus" and ``majority consensus" are the most popular, and have the advantage that they are polynomial time. However, consensus tree methods have the disadvantage that they tend to create consensus trees that lack resolution (and this lack of resolution can be significant), meaning that the trees can contain high degree nodes. An alternate approach seeks a subset of the taxa on which all the trees agree (the ``maximum agreement subset" approach), which means that when restricted to this subset of the taxa, the trees are identical. By contrast with the consensus tree approach, the agreement subset approach is computationally intensive (unless at least one of the trees has low degree); furthermore, the size of the maximum agreement subset can be very small.

An alternate approach is to seek the largest subset of taxa on which all the trees are ``compatible" (meaning that when restricted to this subset of taxa, the trees share a common refinement). The problem is suited for the case where the set of trees contains unresolved trees (such as can happen when returning a set of consensus trees, one for each phylogenetic island obtained during a search for the maximum parsimony or maximum likelihood tree). In such cases, it may return a larger subset of taxa than the maximum agreement subset approach.

We work on developing efficient approximation and fixed-parameter tractable algorithms for the Maximum Compatible Tree problem and in general for the problem of post-processing the outcome of a phylogenetic analysis so as to extract meaningful information common to all the trees returned.

People

Faculty

Students


Publications