An analysis of factors affecting software reliability
Journal of Systems and Software
Haplotyping as perfect phylogeny: conceptual framework and efficient solutions
Proceedings of the sixth annual international conference on Computational biology
Introduction to Algorithms
Large scale reconstruction of haplotypes from genotype data
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Efficient rule-based haplotyping algorithms for pedigree data
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Haplotype reconstruction from SNP alignment
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Resolution of haplotypes and haplotype frequencies from SNP genotypes of pooled samples
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
The Haplotyping problem: an overview of computational models and solutions
Journal of Computer Science and Technology
Complexity and approximation of the minimum recombinant haplotype configuration problem
Theoretical Computer Science
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Fixed-parameter algorithm for haplotype inferences on general pedigrees with small number of sites
WABI'10 Proceedings of the 10th international conference on Algorithms in bioinformatics
Haplotypes versus genotypes on pedigrees
WABI'10 Proceedings of the 10th international conference on Algorithms in bioinformatics
Linear-Time haplotype inference on pedigrees without recombinations
WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
Mutation Region Detection for Closely Related Individuals without a Known Pedigree
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Complexity and approximation of the minimum recombination haplotype configuration problem
ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Fixed-parameter algorithm for general pedigrees with a single pair of sites
ISBRA'10 Proceedings of the 6th international conference on Bioinformatics Research and Applications
The parameterized complexity of the shared center problem
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
We study the problem of reconstructing haplotype configurations from genotypes on pedigree data with missing alleles under the Mendelian law of inheritance and the minimum recombination principle, which is important for the construction of haplotype maps and genetic linkage/association analysis. Our previous results show that the problem of finding a minimum-recombinant haplotype configuration (MRHC) is in general NP-hard. The existing algorithms for MRHC either are heuristic in nature and cannot guarantee optimality, or only work under some restrictions (on e.g. the size and structure of the input pedigree, the number of marker loci, the number of recombinants in the pedigree, etc.). In addition, most of them cannot handle data with missing alleles and, for those that do consider missing data, they usually do not perform well in terms of minimizing the number of recombinants when a significant fraction of alleles are missing. In this paper, we develop an effective integer linear programming (ILP) formulation of the MRHC problem with missing data and a branch-and-bound strategy that utilizes a partial order relationship (and some other special relationships) among variables to decide the branching order. The partial order relationship is discovered in the preprocessing of constraints by considering unique properties in our ILP formulation. A directed graph is built based on the variables and their partial order relationship. By identifying and collapsing the strongly connected components in the graph, we may greatly reduce the size of an ILP instance. Non-trivial (lower and upper) bounds on the optimal number of recombinants are introduced at each branching node to effectively prune the search tree. When multiple solutions exist, a best haplotype configuration is selected based on a maximum likelihood approach. Our results on simulated data show that the algorithm could recover haplotypes with 50 loci from a pedigree of size 29 in seconds on a standard PC. Its accuracy is more than 99.8% for data with no missing alleles and 98.3% for data with 20% missing alleles in terms of correctly recovered phase information at each marker locus. As an application of our algorithm to real data, we present some test results on reconstructing haplotypes from a genome-scale SNP data set consisting of 12 pedigrees that have 0.8% to 14.5% missing alleles.