Sublinear Expected Time Approximate String Matching and Biological

Authors:
William I. Chang;Eugene L. Lawler
Affiliations:
-;-
Venue:
Sublinear Expected Time Approximate String Matching and Biological
Year:
1991

Citing 0
Cited 1

Reference-based alignment in large sequence databases

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The k differences approximate string matching problem specifies a text string of length n, a pattern string of length m, the number k of differences (substitutions, insertions, deletions) allowed in a match, and asks for all locations in the text where a match occurs. We treat k not as a constant but as a fraction of m (not necessarily constant-fraction). Previous algorithms require at least O(kn) time (or else exponential space). We are interested in much faster algorithms for restricted cases of the problem, such as when the text string is random and the allowable error rate is not too high (log-fraction). We have devised an algorithm that is sublinear time 0 (n/m)k logb m) on the average, when k is bounded by the threshold m/(logbm) the expected running time is o(n). In the worst case, our algorithm is 0(kn), but still an improvement in that it is practical and uses 0(m) space compared to 0(n) or 0(msquared). We define three problems inspired by molecular biology and describe efficient algorithms based