An efficient multicharacter transition string-matching engine based on the aho-corasick algorithm

  • Authors:
  • Chien-Chi Chen;Sheng-De Wang

  • Affiliations:
  • National Taiwan University;National Taiwan University

  • Venue:
  • ACM Transactions on Architecture and Code Optimization (TACO)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

A string-matching engine capable of inspecting multiple characters in parallel can multiply the throughput. However, the space required for implementing a matching engine that can process multiple characters in parallel generally grows exponentially with respect to the characters to be processed in parallel. Based on the Aho-Corasick algorithm (AC-algorithm), this work presents a novel multicharacter transition Nondeterministic Finite Automaton (NFA) approach, called multicharacter AC-NFA, to allow for the inspection of multiple characters in parallel. This approach first converts an AC-trie to an AC-NFA by allowing for the simultaneous activation of multiple states and then converts the AC-NFA to a k-character AC-NFA by an algorithm with concatenation operations and assistant transitions. Additionally, the alignment problem, which occurs while multiple characters are being inspected in parallel, is solved using assistant transitions. Moreover, a corresponding output is provided for each inspected character by introducing priority multiplexers to determine the final matching outputs during implementation of the multicharacter AC-NFA. Consequently, the number of derived k-character transitions grows linearly with respect to the number k. Furthermore, the derived multicharacter AC-NFA is implemented on FPGAs for evaluation. The resulting throughput grows approximately 14 times and the hardware cost grows about 18 times for 16-character AC-NFA implementation, as compared with that for 1-character AC-NFA implementation. The achievable throughput is 21.4Gbps for the 16-character AC-NFA implementation operating at a 167.36MHz clock.