Phylogenetic Networks

Background Information

Introduction
Phylogenies, i.e., the evolutionary histories of groups of organisms, play a major role in representing the interrelationships among biological entities. Many methods for reconstructing such phylogenies have been proposed, but almost all of them assume that the underlying evolutionary history of a given set of species can be represented by a tree. While this model gives a satisfactory first-order approximation for many families of organisms, other families exhibit evolutionary mechanisms that cannot be represented by a tree. Processes such as hybridization and horizontal gene transfer result in networks of relationships rather than trees of relationships. Although this problem is widely appreciated, there has been comparatively little work on computational methods for estimating evolutionary networks.

Hybridization and gene transfer
Two of the mechanisms that can result in non-tree evolution are hybridization and horizontal gene transfer. In these two cases, the true evolutionary history is best represented by a network, or directed acyclic graph, rather than by a tree.

Consider how an individual site evolves down a network. For diploid organisms, each chromosome consists of a pair of homologs. In a diploid hybridization event, the hybrid inherits one of the two homologs for each chromosome from each of its two parents. Since homologs assort at random into the gametes (sex cells), each has an equal probability of ending up in the hybrid. In polyploid hybridization, both homologs from both parents are contributed to the hybrid. Prior to the hybridization, each site on the homolog has evolved in a tree-like fashion, although due to meiotic recombination (exchanges between the parental homologs during gamete production), different strings of sites may have different histories. Thus each site in the homologs of the parents of the hybrid evolved in a tree-like fashion on one of the trees contained inside (or, induced by) the network reprsenting the hybridization event. Similarly, in an evolutionary scenario involving horizontal transfer, certain sites are inherited through horizontal transfer from another species, while all others are inherited from the parent. Thus, in each of those two scenarios, each site evolves down one of the trees induced by the network.

Representation
Phylogenetic networks can be represented by rooted directed acyclic graphs, where each node (except for the root) has indegree 1 or 2. Nodes of indegree 1 are called tree nodes, whereas nodes of indegree 2 are called hybrid nodes. A hybrid node typically takes its genetic material from both of its parents, whereas a tree node takes its genetic material from its sole parent. The leaves of a network represent the extant taxa, and the internal nodes represent the hypothetical ancestral taxa. Whereas phylogenetic trees have a standard representation, the Newick format (a form of preorder traversal), no such representation exists for phylogeneitc networks. We thus simply represent a network as a list of its edges, where each edge is defined by its two endpoints and its weight (the expected number of changes along that edge).

People

Faculty

Graduate Students Undergraduate Students

Papers