On-line approximate string matching with bounded errors

Authors:
Marcos Kiwi;Gonzalo Navarro;Claudio Telha
Affiliations:
Departamento de Ingeniería Matemática & Centro de Modelamiento Matemáático UMI 2807 CNRS-UChile, University of Chile, Chile;Department of Computer Science, University of Chile, Chile;Operations Research Center, MIT, United States
Venue:
Theoretical Computer Science
Year:
2011

Citing 11
Cited 0

Approximate string-matching with q-grams and maximal matches

Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Approximate Boyer-Moore string matching

SIAM Journal on Computing
Self-testing/correcting with applications to numerical problems

Journal of Computer and System Sciences - Special issue: papers from the 22nd ACM symposium on the theory of computing, May 14–16, 1990
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Approximate String Matching and Local Similarity

CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Algorithms column: sublinear time algorithms

ACM SIGACT News
Average-optimal single and multiple approximate string matching

Journal of Experimental Algorithmics (JEA)
Large deviations for sums of partly dependent random variables

Random Structures & Algorithms - Isaac Newton Institute Programme “Computation, Combinatorics and Probability”: Part I
Approximate string matching in sublinear expected time

SFCS '90 Proceedings of the 31st Annual Symposium on Foundations of Computer Science
On-Line Approximate String Matching with Bounded Errors

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Concentration of Measure for the Analysis of Randomized Algorithms

Concentration of Measure for the Analysis of Randomized Algorithms

Quantified Score

Hi-index	5.23

Visualization

Abstract

We introduce a new dimension to the widely studied on-line approximate string matching problem, by introducing an error threshold parameter @e so that the algorithm is allowed to miss occurrences with probability @e. This is particularly appropriate for this problem, as approximate searching is used to model many cases where exact answers are not mandatory. We show that the relaxed version of the problem allows us breaking the average-case optimal lower bound of the classical problem, achieving average case O(nlog"@sm/m) time with any @e=poly(k/m), where n is the text size, m the pattern length, k the number of differences for edit distance, and @s the alphabet size. Our experimental results show the practicality of this novel and promising research direction. Finally, we extend the proposed approach to the multiple approximate string matching setting, where the approximate occurrence of r patterns are simultaneously sought. Again, we can break the average-case optimal lower bound of the classical problem, achieving average case O(nlog"@s(rm)/m) time with any @e=poly(k/m).