Insights on haplotype inference on large genotype datasets

  • Authors:
  • Rogério S. Rosa;Katia S. Guimarães

  • Affiliations:
  • Federal University of Pernambuco, Center of Informatics, Recife, Brazil;Federal University of Pernambuco, Center of Informatics, Recife, Brazil

  • Venue:
  • BSB'10 Proceedings of the Advances in bioinformatics and computational biology, and 5th Brazilian conference on Bioinformatics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present insights on the problem of haplotype inference for large genotype datasets. Our observations are drawn from an extensive comparison of three methods for haplotype inference using several datasets taken from HapMap. The methods chosen, PTG, Haplorec, and fastPHASE, are among the best known; they are based on different approaches, and are able to deal with large amounts of data. Our analysis controls the execution time and also the accuracy of results, based on the Error Rate and the Switch Error, as well as sequence conservation patterns. The results show that (1) fastPHASE and Haplorec are both more accurate than PTG, (2) fastPHASE is computationally the most expensive of the three methods, while Haplorec may fail to resolve long sequences, and (3) all approaches do better with more conserved sequences, and tend to fail in distinct sequence sites.