CompactDFA: generic state machine compression for scalable pattern matching

  • Authors:
  • Anat Bremler-Barr;David Hay;Yaron Koral

  • Affiliations:
  • Computer Science Dept., Interdisciplinary Center, Herzliya, Israel;Dept. of Electrical Engineering, Columbia University, NY;Computer Science Dept., Interdisciplinary Center, Herzliya, Israel

  • Venue:
  • INFOCOM'10 Proceedings of the 29th conference on Information communications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Pattern matching algorithms lie at the core of all contemporary Intrusion Detection Systems (IDS), making it intrinsic to reduce their speed and memory requirements. This paper focuses on the most popular class of pattern-matching algorithms, the Aho-Corasick-like algorithms, which are based on constructing and traversing a Deterministic Finite Automaton (DFA), representing the patterns. While this approach ensures deterministic time guarantees, modern IDSs need to deal with hundreds of patterns, thus requiring to store very large DFAs which usually do not fit in fast memory. This results in a major bottleneck on the throughput of the IDS, as well as its power consumption and cost. We propose a novel method to compress DFAs by observing that the name of the states is meaningless. While regular DFAs store separately each transition between two states, we use this degree of freedom and encode states in such a way that all transitions to a specific state can be represented by a single prefix that defines a set of current states. Our technique applies to a large class of automata, which can be categorized by simple properties. Then, the problem of pattern matching is reduced to the well-studied problem of Longest Prefix Matching (LPM) that can be solved either in TCAM, in commercially available IP-lookup chips, or in software. Specifically, we show that with a TCAM our scheme can reach a throughput of 10 Gbps with low power consumption.