Comparing and aggregating partially resolved trees

  • Authors:
  • Mukul S. Bansal;Jianrong Dong;David Fernández-Baca

  • Affiliations:
  • Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA and Department of Computer Science, Iowa State University, Ames, IA 50011, ...;Department of Computer Science, Iowa State University, Ames, IA 50011, USA;Department of Computer Science, Iowa State University, Ames, IA 50011, USA

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2011

Quantified Score

Hi-index 5.23

Visualization

Abstract

Partially-resolved-that is, non-binary-trees arise frequently in the analysis of species evolution. Non-binary nodes, also called multifurcations, must be treated carefully, since they can be interpreted as reflecting either lack of information or actual evolutionary history. While several distance measures exist for comparing trees, none of them deal explicitly with this dichotomy. Here we introduce two kinds of distance measures between rooted and unrooted partially-resolved phylogenetic trees over the same set of species; the measures address multifurcations directly. For rooted trees, the measures are based on the topologies the input trees induce on triplets; that is, on three-element subsets of the set of species. For unrooted trees, the measures are based on quartets (four-element subsets). The first class of measures are parametric distances, where there is a parameter that weighs the difference between an unresolved triplet/quartet topology and a resolved one. The second class of measures are based on the Hausdorff distance, where each tree is viewed as a set of all possible ways in which the tree can be refined to eliminate unresolved nodes. We give efficient algorithms for computing parametric distances and give conditions under which Hausdorff distances can be calculated approximately in polynomial time. Additionally, we (i) derive the expected value of the parametric distance between two random trees, (ii) characterize the conditions under which parametric distances are near-metrics or metrics, (iii) study the computational and algorithmic properties of consensus tree methods based on the measures, and (iv) analyze the interrelationships among Hausdorff and parametric distances.