Approximate regional sequence matching for genomic databases

Authors:
Thanasis Vergoulis;Theodore Dalamagas;Dimitris Sacharidis;Timos Sellis
Affiliations:
NTUA & IMIS, Athena RC, Athens, Greece;IMIS, Athena RC, Athens, Greece;IMIS, Athena RC, Athens, Greece;NTUA & IMIS, Athena RC, Athens, Greece
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2012

Citing 13
Cited 1

Fast and practical approximate string matching

Information Processing Letters
A comparison of approximate string matching algorithms

Software—Practice & Experience
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
New and faster filters for multiple approximate string matching

Random Structures & Algorithms
Faster Bit-Parallel Approximate String Matching

CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Approximate String Matching and Local Similarity

CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Approximate Multiple Strings Search

CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Average complexity of exact and approximate multiple string matching

Theoretical Computer Science
Average-optimal single and multiple approximate string matching

Journal of Experimental Algorithmics (JEA)
OASIS: an online and accurate technique for local-alignment searches on biological sequences

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Reference-based alignment in large sequence databases

Proceedings of the VLDB Endowment
WHAM: a high-throughput sequence alignment method

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Institute for the management of information systems Athena research center

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent advances in computational biology have raised sequence matching requirements that result in new types of sequence database problems. In this work, we introduce an important class of such problems, the approximate regional sequence matching (ARSM) problem. Given a data and a pattern sequence, an ARSM result is an approximate occurrence of a region of the pattern in the data sequence under two conditions. First, the region must contain a predetermined area of the pattern sequence, termed core. Second, the allowable deviation between the region of the pattern and its occurrence in the data sequence depends on the length of the region. We propose the PS-ARSM method that processes holistically the regions of a pattern, taking advantage of their overlaps to efficiently identify the ARSM results. Its performance is evaluated with respect to existing techniques adapted to the ARSM problem.