Highly Scalable Genotype Phasing by Entropy Minimization

Authors:
Alexander Gusev;Ion I. Măndoiu;Bogdan Paşaniuc
Affiliations:
-;-;-
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2008

Citing 6
Cited 3

Haplotype reconstruction from genotype data using Imperfect Phylogeny

Bioinformatics
2SNP: scalable phasing based on 2-SNP haplotypes

Bioinformatics
Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms

INFORMS Journal on Computing
Fast elimination of redundant linear equations and reconstruction of recombination-free mendelian inheritance on a pedigree

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Haplotype inference by pure Parsimony

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Phasing and missing data recovery in family trios

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II

Minimum Entropy Combinatorial Optimization Problems

CiE '09 Proceedings of the 5th Conference on Computability in Europe: Mathematical Theory and Computational Practice
ReFHap: a reliable and fast algorithm for single individual haplotyping

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Genotype error detection using hidden Markov models of haplotype diversity

WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

A Single Nucleotide Polymorphism (SNP) is a positionin the genome at which two or more of the possible fournucleotides occur in a large percentage of the population. SNPsaccount for most of the genetic variability between individuals,and mapping SNPs in the human population has become thenext high-priority in genomics after the completion of the HumanGenome project. In diploid organisms such as humans, thereare two non-identical copies of each autosomal chromosome. Adescription of the SNPs in a chromosome is called a haplotype.At present, it is prohibitively expensive to directly determine thehaplotypes of an individual, but it is possible to obtain rather easilythe conflated SNP information in the so called genotype. Computationalmethods for genotype phasing, i.e., inferring haplotypesfrom genotype data, have received much attention in recent yearsas haplotype information leads to increased statistical power ofdisease association tests. However, many of the existing algorithmshave impractical running time for phasing large genotype datasetssuch as those generated by the international HapMap project.In this paper we propose a highly scalable algorithm based onentropy minimization. Our algorithm is capable of phasing bothunrelated and related genotypes coming from complex pedigrees.Experimental results on both real and simulated datasets showthat our algorithm achieves a phasing accuracy worse but closeto that of best existing methods while being several orders ofmagnitude faster. The open source code implementation of thealgorithm and a web interface are publicly available at http://dna.engr.uconn.edu/~software/ent/.