Multi-Stride String Searching for High-Speed Content Inspection

  • Authors:
  • Derek Pao;Xing Wang

  • Affiliations:
  • -;-

  • Venue:
  • The Computer Journal
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Design of hardware-assisted high-speed string-matching engine for content inspection has been an active research topic. Scalability, flexibility and speed are the three major challenges. In this paper, we shall present a high-speed string matching engine for virus scanning that can process 3 bytes of input data per cycle. Our design uses a memory-based architecture. The hardware circuits need not be reconfigured when the pattern set is updated. We evaluate our design using the ClamAV virus database with over 82K patterns, and the memory cost of our method is about 2.4脗聽MB. The proposed method is an improved version of our previously published method called quick sampling with on demand verification. The previous design has a memory cost of 1.4脗聽MB and a throughput of 1 byte per cycle. Two novel architectural features are incorporated into the new design, namely a new technique to construct near-minimal dynamic perfect hash tables using the bit-shuffle approach, and the introduction of a new concept called byte-shift invariant code (BSIC). With the BSIC, a suffix verification unit can be shared by multiple prefix sampling units. Hence, the processing rate of the new design can be speeded up to three times the processing rate of the old design, while the memory cost is only increased by 72%.