Bit-coded regular expression parsing

Authors:
Lasse Nielsen;Fritz Henglein
Affiliations:
DIKU, University of Copenhagen, Denmark;DIKU, University of Copenhagen, Denmark
Venue:
LATA'11 Proceedings of the 5th international conference on Language and automata theory and applications
Year:
2011

Citing 8
Cited 1

Compact coding of syntactically correct source programs

Software—Practice & Experience
Efficiently building a parse tree from a regular expression

Acta Informatica
Polytypic Compact Printing and Parsing

ESOP '99 Proceedings of the 8th European Symposium on Programming Languages and Systems
Regular expression types for XML

ACM Transactions on Programming Languages and Systems (TOPLAS)
Type inference for unique pattern matching

ACM Transactions on Programming Languages and Systems (TOPLAS)
Faster Regular Expression Matching

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Rex: Symbolic Regular Expression Explorer

ICST '10 Proceedings of the 2010 Third International Conference on Software Testing, Verification and Validation
Typed and unambiguous pattern matching on strings using regular expressions

Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming

Two-Pass greedy regular expression parsing

CIAA'13 Proceedings of the 18th international conference on Implementation and Application of Automata

Quantified Score

Hi-index	0.00

Visualization

Abstract

Regular expression parsing is the problem of producing a parse tree of a string for a given regular expression. We show that a compact bit representation of a parse tree can be produced efficiently, in time linear in the product of input string size and regular expression size, by simplifying the DFA-based parsing algorithm due to Dubé and Feeley to emit the bits of the bit representation without explicitly materializing the parse tree itself. We furthermore show that Frisch and Cardelli's greedy regular expression parsing algorithm can be straightforwardly modified to produce bit codings directly. We implement both solutions as well as a backtracking parser and perform benchmark experiments to gauge their practical performance. We observe that our DFA-based solution can be significantly more time and space efficient than the Frisch-Cardelli algorithm due to its sharing of DFA-nodes, but that the latter may still perform better on regular expressions that are "more deterministic" from the right than the left. (Backtracking is, unsurprisingly, quite hopeless.)