A Dataset Generator for Whole Genome Shotgun Sequencing
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Comparing Assemblies Using Fragments and Mate-Pairs
WABI '01 Proceedings of the First International Workshop on Algorithms in Bioinformatics
SNPs Problems, Complexity, and Algorithms
ESA '01 Proceedings of the 9th Annual European Symposium on Algorithms
The Haplotyping problem: an overview of computational models and solutions
Journal of Computer Science and Technology
Opportunities for Combinatorial Optimization in Computational Biology
INFORMS Journal on Computing
Algorithmica - Parameterized and Exact Algorithms
Haplotype assembly from aligned weighted SNP fragments
Computational Biology and Chemistry
Algorithm engineering for optimal graph bipartization
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Hi-index | 0.00 |
Haplotypes are more useful in complex disease gene mapping than single-nucleotide polymorphisms (SNPs). However, haplotypes are difficult to obtain directly using biological experiments, which has prompted research into efficient computational methods for determining haplotypes. The individual haplotyping problem called Minimum Letter Flip (MLF) is a computational problem that, given a set of aligned DNA sequence fragment data of an individual, induces the corresponding haplotypes by flipping minimum SNPs. There has been no practical exact algorithm for solving the problem. Due to technical limits in DNA sequencing experiments, the maximum length of a fragment sequenced directly is about 1kb. In consequence, with a genome-average SNP density of 1.84 SNPs per 1 kb of DNA sequence, the maximum number k1 of SNP sites that a fragment covers is usually small. Moreover, in order to save time and money, the maximum number k2 of fragments that cover an SNP site is usually no more than 19. Building on these fragment data properties, the current paper introduces a new parameterised algorithm with running time O(nk22k2 + mlogm + mk1), where m is the number of fragments and n is the number of SNP sites. In practical biological applications, the algorithm solves the MLF problem efficiently even if m and n are large.