Deterministic regular expressions in linear time

Authors:
Benotît Groz;Sebastian Maneth;Slawek Staworko
Affiliations:
INRIA and University of Lille, Lille, France;NICTA and UNSW, Sydney, Australia;INRIA and University of Lille, Lille, France
Venue:
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Year:
2012

Citing 23
Cited 1

Fast algorithms for finding nearest common ancestors

SIAM Journal on Computing
From regular expressions to deterministic automata

Theoretical Computer Science
A Four Russians algorithm for regular expression pattern matching

Journal of the ACM (JACM)
Regular expressions into finite automata

Theoretical Computer Science
Optimal parallel dictionary matching and compression (extended abstract)

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
From regular expressions to DFA's using compressed NFA's

Theoretical Computer Science
Time and space efficient method-lookup for object-oriented programs

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Validating streaming XML documents

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Introduction to Automata Theory, Languages and Computability

Introduction to Automata Theory, Languages and Computability
Computing epsilon-Free NFA from Regular Expressions in O(n log²(n)) Time

MFCS '98 Proceedings of the 23rd International Symposium on Mathematical Foundations of Computer Science
A New Quadratic Algorithm to Convert a Regular Expression into an Automaton

WIA '96 Revised Papers from the First International Workshop on Implementing Automata
DTDs versus XML schema: a practical study

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Expressiveness and complexity of XML Schema

ACM Transactions on Database Systems (TODS)
One-unambiguity of regular expressions with numeric occurrence indicators

Information and Computation
Faster Regular Expression Matching

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Lowest common ancestors in trees and directed acyclic graphs

Journal of Algorithms
Inference of concise regular expressions and DTDs

ACM Transactions on Database Systems (TODS)
Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data

ACM Transactions on the Web (TWEB)
Checking determinism of XML Schema content models in optimal time

Information Systems
XPath evaluation in linear time

Journal of the ACM (JACM)
Constant-memory validation of streaming XML documents against DTDs

ICDT'07 Proceedings of the 11th international conference on Database Theory
Regular expressions and NFAs without Ε-transitions

STACS'06 Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science
Validating XML documents in the streaming model with external memory

Proceedings of the 15th International Conference on Database Theory

Deciding definability by deterministic regular expressions

FOSSACS'13 Proceedings of the 16th international conference on Foundations of Software Science and Computation Structures

Quantified Score

Hi-index	0.00

Visualization

Abstract

Deterministic regular expressions are widely used in XML processing. For instance, all regular expressions in DTDs and XML Schemas are required to be deterministic. In this paper we show that determinism of a regular expression e can be tested in linear time. The best known algorithms, based on the Glushkov automaton, require O(σ|e|) time, where σ is the number of distinct symbols in e. We further show that matching a word w against an expression e can be achieved in combined linear time O(|e|+|w|), for a wide range of deterministic regular expressions: (i) star-free (for multiple input words), (ii) bounded-occurrence, i.e., expressions in which each symbol appears a bounded number of times, and (iii) bounded plus-depth, i.e., expressions in which the nesting depth of alternating plus (union) and concatenation symbols is bounded. Our algorithms use a new structural decomposition of the parse tree of e. For matching arbitrary deterministic regular expressions we present an O(|e| + |w|log log|e|) time algorithm.