Approximate regular expression matching with multi-strings

Authors:
Djamal Belazzougui;Mathieu Raffinot
Affiliations:
LIAFA, Univ. Paris Diderot - Paris 7, Paris Cedex, France;LIAFA, Univ. Paris Diderot - Paris 7, Paris Cedex, France
Venue:
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Year:
2011

Citing 12
Cited 0

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
A Four Russians algorithm for regular expression pattern matching

Journal of the ACM (JACM)
A subquadratic algorithm for approximate regular expression matching

Journal of Algorithms
Incremental String Comparison

SIAM Journal on Computing
Programming Techniques: Regular expression search algorithm

Communications of the ACM
Efficient Text Searching of Regular Expressions (Extended Abstract)

ICALP '89 Proceedings of the 16th International Colloquium on Automata, Languages and Programming
Fast Regular Expression Search

WAE '99 Proceedings of the 3rd International Workshop on Algorithm Engineering
Approximate Regular Expression Pattern Matching with Concave Gap Penalties

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Reporting Exact and Approximate Regular Expression Matches

CPM '98 Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching
Fast and compact regular expression matching

Theoretical Computer Science
Faster Regular Expression Matching

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Regular expression matching with multi-strings and intervals

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we are interested in solving the approximate regular expression matching problem: we are given a regular expression R in advance and we wish to answer the following query: given a text T and a parameter k, find all the substrings of T which match the regular expression R with at most k errors (an error consist in deleting inserting, or substituting a character). There exists a well known solution for this problem in time O(mn) where m is the size of the regular expression (the number of operators and characters appearing in R) and n the length of the text. There also exists a solution for the case k = 0 (exact regular expression matching) which solves the problem in time O(dn), where d is the number of strings in the regular expression (a string is a sequence of characters connected with concatenation operator). In this paper, we show that both methods can be combined to solve the approximate regular approximate matching problem in time O(kdn) for arbitrary k. This bound can be much better than the bound O(mn/ logk+2n) achieved by the best actual regular expression matching algorithm in case d m\k logk+2n (that is k is not too large and R contains much less occurrences of ∪ and * than occurrences of (ċ)).