Post-processing sets of phylogenetic trees
For a variety of reasons, a typical
outcome of a phylogenetic analysis of a dataset can consist
of many different unrooted trees, and each tree represents an
equally believable estimate of the true tree. Making sense
of the set of these trees is then a challenging prospect.
There are two basic approaches that have
been used for this problem. The first approach (and the
most popular)
represents the set of trees by a single tree on the full dataset (i.e.
the ``consensus tree"). Consensus tree techniques such
as ``strict consensus" and ``majority consensus" are the
most popular, and have the
advantage that they are polynomial time. However,
consensus tree methods have the disadvantage that they tend to create
consensus trees that lack resolution (and this lack of
resolution can be significant), meaning that the
trees can contain high
degree nodes.
An alternate approach seeks a subset of
the taxa on which all the trees agree (the ``maximum agreement
subset" approach), which means
that when restricted to this subset of
the taxa, the trees are identical.
By contrast with the consensus tree approach, the
agreement subset approach is computationally intensive
(unless at least one of the trees has low degree);
furthermore, the size of the maximum agreement subset can
be very small.
An alternate approach is to seek the largest subset
of taxa on which all the trees are ``compatible" (meaning
that when restricted to this subset of taxa, the trees share
a common refinement).
The problem is suited for the case where the set of trees
contains unresolved trees (such as can happen when returning
a set of consensus trees, one for each phylogenetic island
obtained during a search for the maximum parsimony or maximum
likelihood tree). In such cases, it may return a larger
subset of taxa than the maximum agreement subset approach.
We work on developing efficient approximation and fixed-parameter tractable
algorithms for the Maximum Compatible Tree problem and in general for the
problem of post-processing the outcome of a phylogenetic analysis so as to
extract meaningful information common to all the trees returned.
People
Faculty
Students
Publications
-
Ganapathysaravanabavan, G. and Warnow, T., ``Finding a maximum
compatible trees for a bounded number of trees with bounded degree is
solvable in polynomial time,'' In: Gascuel, O., and
Moret, B.M.E. (eds): Proceedings of the First International Workshop on
Algorithms in Bioinformatics (WABI 2001) (Lecture Notes in Computer Science
2149) (2001) 156-163. ps.
-
Ganapathy G. and Warnow, T., ``Approximating the complement of the Maximum
Compatible Subset of Leaves of k Trees'' Proceedings of the Fifth
International Workshop on Approximation Algorithms for Combinatorial
Optimization (APPROX 2002) ps.