Seeing the trees and their branches in the network is hard

  • Authors:
  • Iyad A. Kanj;Luay Nakhleh;Cuong Than;Ge Xia

  • Affiliations:
  • School of Computer Science, Telecommunications, and Information Systems, DePaul University, 243 S. Wabash Avenue, Chicago, IL 60604-2301, USA;Department of Computer Science, Rice University, 6100 Main Street, MS 132 Houston, TX 77005-1892, USA;School of Computer Science, Telecommunications, and Information Systems, DePaul University, 243 S. Wabash Avenue, Chicago, IL 60604-2301, USA;Department of Computer Science, Acopian Engineering Center, Lafayette College, Easton, PA 18042, USA

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2008

Quantified Score

Hi-index 5.24

Visualization

Abstract

Phylogenetic networks are a restricted class of directed acyclic graphs that model evolutionary histories in the presence of reticulate evolutionary events, such as horizontal gene transfer, hybrid speciation, and recombination. Characterizing a phylogenetic network as a collection of trees and their branches has long been the basis for several methods of reconstructing and evaluating phylogenetic networks. Further, these characterizations have been used to understand molecular sequence evolution on phylogenetic networks. In this paper, we address theoretical questions with regard to phylogenetic networks, their characterizations, and sequence evolution on them. In particular, we prove that the problem of deciding whether a given tree is contained inside a network is NP-complete. Further, we prove that the problem of deciding whether a branch of a given tree is also a branch of a given network is polynomially equivalent to that of deciding whether the evolution of a molecular character (site) on a network is governed by the infinite site model. Exploiting this equivalence, we establish the NP-completeness of both problems, and provide a parameterized algorithm that runs in time O(2^k^/^2n^2), where n is the total number of nodes and k is the number of recombination nodes in the network, which significantly improves upon the trivial brute-force O(2^kn) time algorithm for the problem. This reduction in time is significant, particularly when analyzing recombination hotspots.