Simplifying regular expressions: a quantitative perspective

  • Authors:
  • Hermann Gruber;Stefan Gulan

  • Affiliations:
  • Institut für Informatik, Universität Gießen, Gießen, Germany;Fachbereich IV—Informatik, Universität Trier, Trier, Germany

  • Venue:
  • LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the efficient simplification of regular expressions and suggest a quantitative comparison of heuristics for simplifying regular expressions. To this end, we propose a new normal form for regular expressions, which outperforms previous heuristics while still being computable in linear time. This allows us to determine an exact bound for the relation between the two prevalent measures for regular expression - size: alphabetic width and reverse polish notation length. In addition, we show that every regular expression of alphabetic width n can be converted into a nondeterministic finite automaton with ε-transitions of size at most $4\frac25n+1$, and prove this bound to be optimal. This answers a question posed by Ilie and Yu, who had obtained lower and upper bounds of 4n−1 and $9n-\frac12$, respectively [15]. For reverse polish notation length as input size measure, an optimal bound was recently determined by Gulan and Fernau [14]. We prove that, under mild restrictions, their construction is also optimal when taking alphabetic width as input size measure.