Large scale reconstruction of haplotypes from genotype data
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Empirical exploration of perfect phylogeny haplotyping and haplotypers
COCOON'03 Proceedings of the 9th annual international conference on Computing and combinatorics
Combinatorial problems arising in SNP and haplotype analysis
DMTCS'03 Proceedings of the 4th international conference on Discrete mathematics and theoretical computer science
Hi-index | 0.00 |
Each person''s genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person''s genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a haplotype. The determination of the haplotypes within a population is essential for understanding genetic variation and the inheritance of complex diseases. The haplotype mapping project, a successor to the human genome project, seeks to determine the common haplotypes in the human population. Since experimental determination of a person''s genotype is less expensive than determining its component haplotypes, algorithms are required for computing haplotypes from genotypes. Two observations aid in this process: first, the human genome contains short blocks within which only a few different haplotypes occur; second, as suggested by Gusfield, it is reasonable to assume that the haplotypes observed within a block have evolved according to a perfect phylogeny, in which at most one mutation event has occurred at any site. We present a simple and efficient polynomial-time algorithm for inferring haplotypes from the genotypes of a set of individuals assuming a perfect phylogeny. Using a reduction to 2-SAT we extend this algorithm to handle constraints that apply when we have genotypes from both parents and child. We also present a hardness result for the problem of removing the minimum number of individuals from a population to ensure that the genotypes of the remaining individuals are consistent with a perfect phylogeny. Our algorithms have been tested on real data and give biologically meaningful results.