Gene tree correction for reconciliation and species tree inference: Complexity and algorithms

  • Authors:
  • Riccardo Dondi;Nadia El-Mabrouk;Krister M. Swenson

  • Affiliations:
  • Dipartimento di Scienze Umane e Sociali, Universití degli Studi di Bergamo, Bergamo, Italy;Département d'Informatique et Recherche Opérationnelle, Université de Montréal, Montréal, Canada;Département d'Informatique et Recherche Opérationnelle, Université de Montréal, Montréal, Canada and Department of Computer Science, McGill, Montréal, Canada

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reconciliation consists in mapping a gene tree T into a species tree S, and explaining the incongruence between the two as evidence for duplication, loss and other events shaping the gene family represented by the leaves of T. When S is unknown, the Species Tree Inference Problem is to infer, from a set of gene trees, a species tree leading to a minimum reconciliation cost. As reconciliation is very sensitive to errors in T, gene tree correction prior to reconciliation is a fundamental task. In this paper, we investigate the complexity of four different combinatorial approaches for deleting misplaced leaves from T. First, we consider two problems (Minimum Leaf Removal and Minimum Species Removal) related to the reconciliation of T with a known species tree S. In the former (latter respectively) we want to remove the minimum number of leaves (species respectively) so that T is ''MD-consistent'' with S. Second, we consider two problems (Minimum Leaf Removal Inference and Minimum Species Removal Inference) related to species tree inference. In the former (latter respectively) we want to remove the minimum number of leaves (species respectively) from T so that there exists a species tree S such that T is MD-consistent with S. We prove that Minimum Leaf Removal and Minimum Species Removal are APX-hard, even when each label has at most two occurrences in the input gene tree, and we present fixed-parameter algorithms for the two problems. We prove that Minimum Leaf Removal Inference is not only NP-hard, but also W[2]-hard and inapproximable within factor clnn, where n is the number of leaves in the gene tree. Finally, we show that Minimum Species Removal Inference is NP-hard and W[2]-hard, when parameterized by the size of the solution, that is the minimum number of species removals.