Automatic Generation of String Signatures for Malware Detection

  • Authors:
  • Kent Griffin;Scott Schneider;Xin Hu;Tzi-Cker Chiueh

  • Affiliations:
  • Symantec Research Laboratories,;Symantec Research Laboratories,;Symantec Research Laboratories,;Symantec Research Laboratories,

  • Venue:
  • RAID '09 Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scanning files for signatures is a proven technology, but exponential growth in unique malware programs has caused an explosion in signature database sizes. One solution to this problem is to use string signatures , each of which is a contiguous byte sequence that potentially can match many variants of a malware family. However, it is not clear how to automatically generate these string signatures with a sufficiently low false positive rate. Hancock is the first string signature generation system that takes on this challenge on a large scale. To minimize the false positive rate, Hancock features a scalable model that estimates the occurrence probability of arbitrary byte sequences in goodware programs, a set of library code identification techniques, and diversity-based heuristics that ensure the contexts in which a signature is embedded in containing malware files are similar to one another. With these techniques combined, Hancock is able to automatically generate string signatures with a false positive rate below 0.1%.