New and faster filters for multiple approximate string matching

Authors:
Ricardo Baeza-Yates;Gonzalo Navarro
Affiliations:
Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile;Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile
Venue:
Random Structures & Algorithms
Year:
2002

Citing 24
Cited 14

Algorithms for approximate string matching

Information and Control
Fast parallel and serial approximate string matching

Journal of Algorithms
Efficient text searching

Efficient text searching
A very fast substring search algorithm

Communications of the ACM
An improved algorithm for approximate string matching

SIAM Journal on Computing
A new approach to text searching

Communications of the ACM
Fast text searching: allowing errors

Communications of the ACM
Approximate string-matching with q-grams and maximal matches

Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Approximate Boyer-Moore string matching

SIAM Journal on Computing
Approximate string matching using within-word parallelism

Software—Practice & Experience
A subquadratic algorithm for approximate regular expression matching

Journal of Algorithms
A comparison of approximate string matching algorithms

Software—Practice & Experience
Very fast and simple approximate string matching

Information Processing Letters
A fast string searching algorithm

Communications of the ACM
Efficient string matching: an aid to bibliographic search

Communications of the ACM
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Text-Retrieval: Theory and Practice

Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1 - Volume I
Multiple Approximate String Matching

WADS '97 Proceedings of the 5th International Workshop on Algorithms and Data Structures
Fast and Practical Approximate String Matching

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Theoretical and Empirical Comparisons of Approximate String Matching Algorithms

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Approximate Multiple Strings Search

CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching

CPM '98 Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching
A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming

CPM '98 Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching
On Using q-Gram Locations in Approximate String Matching

ESA '95 Proceedings of the Third Annual European Symposium on Algorithms

A syntactic approach for searching similarities within sentences

Proceedings of the eleventh international conference on Information and knowledge management
Fast multipattern search algorithms for intrusion detection

Fundamenta Informaticae - Special issue on computing patterns in strings
Average-optimal single and multiple approximate string matching

Journal of Experimental Algorithmics (JEA)
An approximate multi-word matching algorithm for robust document retrieval

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
On-line Approximate String Matching in Natural Language

Fundamenta Informaticae
EXTRA: a system for example-based translation assistance

Machine Translation
Average-optimal multiple approximate string matching

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
String matching with inversions and translocations in linear average time (most of the time)

Information Processing Letters
On-line Approximate String Matching in Natural Language

Fundamenta Informaticae
Fast Multipattern Search Algorithms for Intrusion Detection

Fundamenta Informaticae - Computing Patterns in Strings
String matching with involutions

UCNC'12 Proceedings of the 11th international conference on Unconventional Computation and Natural Computation
Approximate regional sequence matching for genomic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient string-matching allowing for non-overlapping inversions

Theoretical Computer Science
Text searching allowing for inversions and translocations of factors

Discrete Applied Mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present three new algorithms for on-line multiple string matching allowing errors. These are extensions of previous algorithms that search for a single pattern. The average running time achieved is in all cases linear in the text size for moderate error level, pattern length, and number of patterns. They adapt (with higher costs) to the other cases. However, the algorithms differ in speed and thresholds of usefulness. We theoretically analyze when each algorithm should be used, and show their performance experimentally. The only previous solution for this problem allows only one error. Our algorithms are the first to allow more errors, and are faster than previous work for a moderate number of patterns (e.g. less than 50-100 on English text, depending on the pattern length).