Compact DFA Representation for Fast Regular Expression Search

Authors:
Gonzalo Navarro;Mathieu Raffinot
Affiliations:
-;-
Venue:
WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
Year:
2001

Citing 10
Cited 5

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
From regular expressions to deterministic automata

Theoretical Computer Science
A Four Russians algorithm for regular expression pattern matching

Journal of the ACM (JACM)
A new approach to text searching

Communications of the ACM
Fast text searching: allowing errors

Communications of the ACM
Regular expressions into finite automata

Theoretical Computer Science
Programming Techniques: Regular expression search algorithm

Communications of the ACM
Fast and flexible string matching by combining bit-parallelism and suffix automata

Journal of Experimental Algorithmics (JEA)
Fast Regular Expression Search

WAE '99 Proceedings of the 3rd International Workshop on Algorithm Engineering
From Regular Expressions to DFA's Using Compressed NFA's

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching

Compressing regular expressions' DFA table by matrix decomposition

CIAA'10 Proceedings of the 15th international conference on Implementation and application of automata
Reducing the size of NFAs by using equivalences and preorders

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Brute force determinization of NFAs by means of state covers

CIAA'04 Proceedings of the 9th international conference on Implementation and Application of Automata
Regular expression sub-matching using partial derivatives

Proceedings of the 14th symposium on Principles and practice of declarative programming
Improving regular-expression matching on strings using negative factors

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new technique to encode a deterministic finite automaton (DFA). Based on the specific properties of Glushkov's nondeterministic finite automaton (NFA) construction algorithm, we are able to encode the DFA using (m+1)(2m+1 +|Σ|) bits, where m is the number of characters (excluding operator symbols) in the regular expression and Σ is the alphabet. This compares favorably against the worst case of (m + 1)2m+1|Σ| bits needed by a classical DFA representation and m(22m+1 + |Σ|) bits needed by the Wu and Manber approach implemented in Agrep. Our approach is practical and simple to implement, and it permits searching regular expressions of moderate size (which include most cases of interest) faster than with any previously existing algorithm, as we show experimentally.