2SNP: scalable phasing based on 2-SNP haplotypes
Bioinformatics
Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms
INFORMS Journal on Computing
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Haplotype inference by pure Parsimony
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Phasing and missing data recovery in family trios
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Minimum Entropy Combinatorial Optimization Problems
CiE '09 Proceedings of the 5th Conference on Computability in Europe: Mathematical Theory and Computational Practice
ReFHap: a reliable and fast algorithm for single individual haplotyping
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Genotype error detection using hidden Markov models of haplotype diversity
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Hi-index | 0.00 |
A Single Nucleotide Polymorphism (SNP) is a positionin the genome at which two or more of the possible fournucleotides occur in a large percentage of the population. SNPsaccount for most of the genetic variability between individuals,and mapping SNPs in the human population has become thenext high-priority in genomics after the completion of the HumanGenome project. In diploid organisms such as humans, thereare two non-identical copies of each autosomal chromosome. Adescription of the SNPs in a chromosome is called a haplotype.At present, it is prohibitively expensive to directly determine thehaplotypes of an individual, but it is possible to obtain rather easilythe conflated SNP information in the so called genotype. Computationalmethods for genotype phasing, i.e., inferring haplotypesfrom genotype data, have received much attention in recent yearsas haplotype information leads to increased statistical power ofdisease association tests. However, many of the existing algorithmshave impractical running time for phasing large genotype datasetssuch as those generated by the international HapMap project.In this paper we propose a highly scalable algorithm based onentropy minimization. Our algorithm is capable of phasing bothunrelated and related genotypes coming from complex pedigrees.Experimental results on both real and simulated datasets showthat our algorithm achieves a phasing accuracy worse but closeto that of best existing methods while being several orders ofmagnitude faster. The open source code implementation of thealgorithm and a web interface are publicly available at http://dna.engr.uconn.edu/~software/ent/.