Finding similar regions in many strings
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
On the closest string and substring problems
Journal of the ACM (JACM)
A Linear-Time Algorithm for the 1-Mismatch Problem
WADS '97 Proceedings of the 5th International Workshop on Algorithms and Data Structures
Hi-index | 0.00 |
The study of variation in DNA sequences, within the framework of phylogeny or population genetics, for instance, is one of the most important subjects in modern genomics. We here present a new linear-time algorithm for finding maximal k-regions in alignments of three sequences, which can be used for the detection of segments featuring a certain degree of similarity, as well as the boundaries of distinct genomic environments such as gene clusters or haplotype blocks. k-regions are defined as these which have a center sequence whose Hamming distance from any of the alignment rows is at most k, and their determination in the general case is known to be NP-hard.