Anagram: a content anomaly detector resistant to mimicry attack

  • Authors:
  • Ke Wang;Janak J. Parekh;Salvatore J. Stolfo

  • Affiliations:
  • Computer Science Department, Columbia University, New York, NY;Computer Science Department, Columbia University, New York, NY;Computer Science Department, Columbia University, New York, NY

  • Venue:
  • RAID'06 Proceedings of the 9th international conference on Recent Advances in Intrusion Detection
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present Anagram, a content anomaly detector that models a mixture ofhigh-order n-grams (n 1) designed to detect anomalous and “suspicious” network packet payloads. By using higher-order n-grams, Anagram can detect significant anomalous byte sequences and generate robust signatures of validated malicious packet content. The Anagram content models are implemented using highly efficient Bloom filters, reducing space requirements and enabling privacy-preserving cross-site correlation. The sensor models the distinct content flow of a network or host using a semi-supervised training regimen. Previously known exploits, extracted from the signatures of an IDS, are likewise modeled in a Bloom filter and are used during training as well as detection time. We demonstrate that Anagram can identify anomalous traffic with high accuracy and low false positive rates. Anagram's high-order n-gram analysis technique is also resilient against simple mimicry attacks that blend exploits with “normal” appearing byte padding, such as the blended polymorphic attack recently demonstrated in [1]. We discuss randomized n-gram models, which further raises the bar and makes it more difficult for attackers to build precise packet structures to evade Anagram even if they know the distribution of the local site content flow. Finally, Anagram's speed and high detection rate makes it valuable not only as a standalone sensor, but also as a network anomaly flow classifier in an instrumented fault-tolerant host-based environment; this enables significant cost amortization and the possibility of a “symbiotic” feedback loop that can improve accuracy and reduce false positive rates over time.