Fast and practical approximate string matching
Information Processing Letters
A comparison of approximate string matching algorithms
Software—Practice & Experience
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
New and faster filters for multiple approximate string matching
Random Structures & Algorithms
Faster Bit-Parallel Approximate String Matching
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Approximate String Matching and Local Similarity
CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Approximate Multiple Strings Search
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Average complexity of exact and approximate multiple string matching
Theoretical Computer Science
Average-optimal single and multiple approximate string matching
Journal of Experimental Algorithmics (JEA)
OASIS: an online and accurate technique for local-alignment searches on biological sequences
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Reference-based alignment in large sequence databases
Proceedings of the VLDB Endowment
WHAM: a high-throughput sequence alignment method
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Hi-index | 0.00 |
Recent advances in computational biology have raised sequence matching requirements that result in new types of sequence database problems. In this work, we introduce an important class of such problems, the approximate regional sequence matching (ARSM) problem. Given a data and a pattern sequence, an ARSM result is an approximate occurrence of a region of the pattern in the data sequence under two conditions. First, the region must contain a predetermined area of the pattern sequence, termed core. Second, the allowable deviation between the region of the pattern and its occurrence in the data sequence depends on the length of the region. We propose the PS-ARSM method that processes holistically the regions of a pattern, taking advantage of their overlaps to efficiently identify the ARSM results. Its performance is evaluated with respect to existing techniques adapted to the ARSM problem.