String Searching Engine for Virus Scanning

  • Authors:
  • Derek Pao;Xing Wang;Xiaoran Wang;Cong Cao;Yuesheng Zhu

  • Affiliations:
  • City University of Hong Kong, Hong Kong;Peking University, Shenzhen;City University of Hong Kong, Hong Kong;City University of Hong Kong, Hong Kong;Peking University, Shenzhen

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 2011

Quantified Score

Hi-index 14.98

Visualization

Abstract

A memory-efficient hardware string searching engine for antivirus applications is presented. The proposed QSV method is based on quick sampling of the input stream against fixed-length pattern prefixes, and on-demand verification of variable-length pattern suffixes. Patterns handled by the QSV method are required to have at least 16 bytes, and possess distinct 16-byte prefixes. The latter requirement can be fulfilled by a preprocessing procedure. The search engine uses the pipelined Aho-Corasick (P-AC) architecture developed by the first author to process 4 to 15-byte short patterns and a small number of exception cases. Our design was evaluated using the ClamAV virus database having 82,888 strings with a total size that exceeds 8 Mbyte. In terms of byte count, 99.3 percent of the pattern set is handled by the QSV method and 0.7 percent of the pattern set is handled by P-AC. A pattern with distinct 16-byte prefix only occupies up to three lookup table entries in QSV. The overall memory cost of our system is about 1.4 Mbyte, i.e., 1.4 bit per character of the ClamAV pattern set. The proposed method is memory-based, hence, updates to the pattern set can be accommodated by modifying the contents of the lookup tables without reconfiguring the hardware circuits.