ReFHap: a reliable and fast algorithm for single individual haplotyping

  • Authors:
  • Jorge Duitama;Thomas Huebsch;Gayle McEwen;Eun-Kyung Suk;Margret R. Hoehe

  • Affiliations:
  • University of Connecticut, Storrs, CT;Max Planck Institute for Molecular Genetics, Berlin, Germany;Max Planck Institute for Molecular Genetics, Berlin, Germany;Max Planck Institute for Molecular Genetics, Berlin, Germany;Max Planck Institute for Molecular Genetics, Berlin, Germany

  • Venue:
  • Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Full human genomic sequences have been published in the latest two years for a growing number of individuals. Most of them are a mixed consensus of the two real haplotypes because it is still very expensive to separate information coming from the two copies of a chromosome. However, latest improvements and new experimental approaches promise to solve these issues and provide enough information to reconstruct the sequences for the two copies of each chromosome through bioinformatics methods such as single individual haplotyping. Full haploid sequences provide a complete understanding of the structure of the human genome, allowing accurate predictions of translation in protein coding regions and increasing power of association studies. In this paper we present a novel problem formulation for single individual haplotyping. We start by assigning a score to each pair of fragments based on their common allele calls and then we use these score to formulate the problem as the cut of fragments that maximize an objective function, similar to the well known max-cut problem. Our algorithm initially finds the best cut based on a heuristic algorithm for max-cut and then builds haplotypes consistent with that cut. We have compared both accuracy and running time of ReFHap with other heuristic methods on both simulated and real data and found that ReFHap performs significantly faster than previous methods without loss of accuracy.