A-DFA: A Time- and Space-Efficient DFA Compression Algorithm for Fast Regular Expression Evaluation

  • Authors:
  • Michela Becchi;Patrick Crowley

  • Affiliations:
  • University of Missouri;Washington University in St. Louis

  • Venue:
  • ACM Transactions on Architecture and Code Optimization (TACO)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern network intrusion detection systems need to perform regular expression matching at line rate in order to detect the occurrence of critical patterns in packet payloads. While Deterministic Finite Automata (DFAs) allow this operation to be performed in linear time, they may exhibit prohibitive memory requirements. Kumar et al. [2006a] have proposed Delayed Input DFAs (D2FAs), which provide a trade-off between the memory requirements of the compressed DFA and the number of states visited for each character processed, which in turn affects the memory bandwidth required to evaluate regular expressions. In this article we introduce Amortized time − bandwidth overhead DFAs (A − DFAs), a general compression technique that results in at most N(k + 1)/k state traversals when processing a string of length N, k being a positive integer. In comparison to the D2FA approach, our technique achieves comparable levels of compression with lower provable bounds on memory bandwidth (or greater compression for a given bandwidth bound). Moreover, the A-DFA algorithm has lower complexity, can be applied during DFA creation, and is suitable for scenarios where a compressed DFA needs to be dynamically built or updated. Finally, we show how to combine A-DFA with alphabet reduction and multistride DFAs, two techniques aimed at reducing the memory space and bandwidth requirement of DFAs, and discuss memory encoding schemes suitable for A-DFAs.