Highly Scalable Genotype Phasing by Entropy Minimization

  • Authors:
  • Alexander Gusev;Ion I. Măndoiu;Bogdan Paşaniuc

  • Affiliations:
  • -;-;-

  • Venue:
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A Single Nucleotide Polymorphism (SNP) is a positionin the genome at which two or more of the possible fournucleotides occur in a large percentage of the population. SNPsaccount for most of the genetic variability between individuals,and mapping SNPs in the human population has become thenext high-priority in genomics after the completion of the HumanGenome project. In diploid organisms such as humans, thereare two non-identical copies of each autosomal chromosome. Adescription of the SNPs in a chromosome is called a haplotype.At present, it is prohibitively expensive to directly determine thehaplotypes of an individual, but it is possible to obtain rather easilythe conflated SNP information in the so called genotype. Computationalmethods for genotype phasing, i.e., inferring haplotypesfrom genotype data, have received much attention in recent yearsas haplotype information leads to increased statistical power ofdisease association tests. However, many of the existing algorithmshave impractical running time for phasing large genotype datasetssuch as those generated by the international HapMap project.In this paper we propose a highly scalable algorithm based onentropy minimization. Our algorithm is capable of phasing bothunrelated and related genotypes coming from complex pedigrees.Experimental results on both real and simulated datasets showthat our algorithm achieves a phasing accuracy worse but closeto that of best existing methods while being several orders ofmagnitude faster. The open source code implementation of thealgorithm and a web interface are publicly available at http://dna.engr.uconn.edu/~software/ent/.