Haplotype assembly from aligned weighted SNP fragments

  • Authors:
  • Yu-Ying Zhao;Ling-Yun Wu;Ji-Hong Zhang;Rui-Sheng Wang;Xiang-Sun Zhang

  • Affiliations:
  • Institute of Applied Mathematics, Academy of Mathematics and Systems Science, CAS, Beijing 100080, China;Institute of Applied Mathematics, Academy of Mathematics and Systems Science, CAS, Beijing 100080, China;Institute of Applied Mathematics, Academy of Mathematics and Systems Science, CAS, Beijing 100080, China;Institute of Applied Mathematics, Academy of Mathematics and Systems Science, CAS, Beijing 100080, China;Institute of Applied Mathematics, Academy of Mathematics and Systems Science, CAS, Beijing 100080, China

  • Venue:
  • Computational Biology and Chemistry
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given an assembled genome of a diploid organism the haplotype assembly problem can be formulated as retrieval of a pair of haplotypes from a set of aligned weighted SNP fragments. Known computational formulations (models) of this problem are minimum letter flips (MLF) and the weighted minimum letter flips (WMLF; Greenberg et al. (INFORMS J. Comput. 2004, 14, 211-213)). In this paper we show that the general WMLF model is NP-hard even for the gapless case. However the algorithmic solutions for selected variants of WMFL can exist and we propose a heuristic algorithm based on a dynamic clustering technique. We also introduce a new formulation of the haplotype assembly problem that we call COMPLETE WMLF (CWMLF). This model and algorithms for its implementation take into account a simultaneous presence of multiple kinds of data errors. Extensive computational experiments indicate that the algorithmic implementations of the CWMLF model achieve higher accuracy of haplotype reconstruction than the WMLF-based algorithms, which in turn appear to be more accurate than those based on MLF. n the WMLF-based algorithms, which in turn appear to be more accurate than those based on MLF.