Tools for Very Fast Regular Expression Matching

  • Authors:
  • Davide Pasetto;Fabrizio Petrini;Virat Agarwal

  • Affiliations:
  • IBM Computational Science Center, Ireland;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center

  • Venue:
  • Computer
  • Year:
  • 2010

Quantified Score

Hi-index 4.10

Visualization

Abstract

Regular expressions, or regex, are a common choice for defining configurable rules for data parsing because of their expressiveness in detecting recurrent patterns and information. For many data-intensive applications, regex matching is the first line of defense in performing online data filtering. Unfortunately, few solutions can keep up with the increasing data rates and the complexity posed by sets with hundreds of expressions. DotStar addresses this problem by providing a complete algorithmic solution and a software tool chain that can compile large sets of user-provided regex first into a sequence of intermediate representations and then into an automaton that can search for matches in a single pass without backtracking. The entire software tool chain supports the extended Posix standard syntax for regex.