Influence of tree topology restrictions on the complexity of haplotyping with missing data

  • Authors:
  • Michael Elberfeld;Ilka Schnoor;Till Tantau

  • Affiliations:
  • Institut für Theoretische Informatik, Universität zu Lübeck, D-23538 Lübeck, Germany;Institut für Informatik, Christian-Albrechts-Universität zu Kiel, D-24118 Kiel, Germany;Institut für Theoretische Informatik, Universität zu Lübeck, D-23538 Lübeck, Germany

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2012

Quantified Score

Hi-index 5.23

Visualization

Abstract

Haplotyping, also known as haplotype phase prediction, is the problem of predicting likely haplotypes based on genotype data. One fast haplotyping method is based on an evolutionary model in which a perfect phylogenetic tree is sought that explains the observed data. Unfortunately, when data entries are missing, which is often the case in laboratory data, the resulting formal problem ipph, which stands for incomplete perfect phylogeny haplotyping, is NP-complete. Even radically simplified versions, such as the restriction to phylogenetic trees consisting of just two directed paths from a given root, are still NP-complete; but here, at least, a fixed-parameter algorithm is known. Such drastic and ad hoc simplifications turn out to be unnecessary to make ipph tractable: we present the first theoretical analysis of a parameterized algorithm, which we develop in the course of the paper, that works for arbitrary instances of ipph. This tractability result is optimal insofar as we prove ipph to be NP-complete whenever any of the parameters we consider is not fixed, but part of the input.