An efficient multicharacter transition string-matching engine based on the aho-corasick algorithm

Authors:
Chien-Chi Chen;Sheng-De Wang
Affiliations:
National Taiwan University;National Taiwan University
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2013

Citing 15
Cited 0

Efficient string matching: an aid to bibliographic search

Communications of the ACM
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Scalable Pattern Matching for High Speed Networks

FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Fast Regular Expression Matching Using FPGAs

FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Highly Efficient String Matching Circuit for IDS with FPGA

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Multipattern string matching with q-grams

Journal of Experimental Algorithmics (JEA)
High Speed Pattern Matching for Network IDS/IPS

ICNP '06 Proceedings of the Proceedings of the 2006 IEEE International Conference on Network Protocols
Exact multi-pattern string matching on the cell/b.e. processor

Proceedings of the 5th conference on Computing frontiers
Pipelined Parallel AC-Based Approach for Multi-String Matching

ICPADS '08 Proceedings of the 2008 14th IEEE International Conference on Parallel and Distributed Systems
Input-independent, scalable and fast string matching on the Cray XMT

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Efficient pattern matching on GPUs for intrusion detection systems

Proceedings of the 7th ACM international conference on Computing frontiers
A memory-efficient pipelined implementation of the aho-corasick string-matching algorithm

ACM Transactions on Architecture and Code Optimization (TACO)
Aho-Corasick String Matching on Shared and Distributed-Memory Parallel Architectures

IEEE Transactions on Parallel and Distributed Systems
Multi-Stride String Searching for High-Speed Content Inspection

The Computer Journal
Robust and Scalable String Pattern Matching for Deep Packet Inspection on Multicore Processors

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A string-matching engine capable of inspecting multiple characters in parallel can multiply the throughput. However, the space required for implementing a matching engine that can process multiple characters in parallel generally grows exponentially with respect to the characters to be processed in parallel. Based on the Aho-Corasick algorithm (AC-algorithm), this work presents a novel multicharacter transition Nondeterministic Finite Automaton (NFA) approach, called multicharacter AC-NFA, to allow for the inspection of multiple characters in parallel. This approach first converts an AC-trie to an AC-NFA by allowing for the simultaneous activation of multiple states and then converts the AC-NFA to a k-character AC-NFA by an algorithm with concatenation operations and assistant transitions. Additionally, the alignment problem, which occurs while multiple characters are being inspected in parallel, is solved using assistant transitions. Moreover, a corresponding output is provided for each inspected character by introducing priority multiplexers to determine the final matching outputs during implementation of the multicharacter AC-NFA. Consequently, the number of derived k-character transitions grows linearly with respect to the number k. Furthermore, the derived multicharacter AC-NFA is implemented on FPGAs for evaluation. The resulting throughput grows approximately 14 times and the hardware cost grows about 18 times for 16-character AC-NFA implementation, as compared with that for 1-character AC-NFA implementation. The achievable throughput is 21.4Gbps for the 16-character AC-NFA implementation operating at a 167.36MHz clock.