Polymorphic worm detection and defense: system design, experimental methodology, and data resources

  • Authors:
  • Jisheng Wang;lhab Hamadeh;George Kesidis;David J. Miller

  • Affiliations:
  • Penn State University, University Park, PA;Penn State University, University Park, PA;Penn State University, University Park, PA;Penn State University, University Park, PA

  • Venue:
  • Proceedings of the 2006 SIGCOMM workshop on Large-scale attack defense
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The polymorphic variety of Internet worms presents a formidable challenge to network intrusion detection and methods designed to extract payload signatures for worm containment. Recently, several systems, including Earlybird and Polygraph, have been proposed, based on efficient processing of payloads to extract signatures that are either explicitly indicative of an attack (exploit code strings) or which have unusual statistical character (content prevalence, address dispersion) consistent with worm activity. While these works are seminal, these systems have limitations that affect accuracy of the extracted signatures and/or practicability of the system's deployment. Earlybird's signature extraction is fragile to polymorphism, while Polygraph makes assumptions about data availability and the accuracy of front-end flow classification. This method also possesses high complexity.We propose a new method which, fundamentally, integrates header-based multidimensional flow clustering as front-end processing, with content sifting (signature extraction) performed, separately, solely on each cluster in the (small) subset of identified suspicious clusters. Front-end clustering improves purity of the (separate) signature pools and also reduces complexity. We apply a "suffix tree" approach to signature extraction, gleaning both length and frequency information. We demonstrate efficacy of our approach on a (background) trace taken from a /24 in Taiwan, which we salt with worm traffic based on two realistic polymorphic mechanisms that we propose. Since there is a dearth of public data for such testing, we have also made an anonymized version of this trace available, based on randomized headers and fingerprinted payloads.