Experimental challenges in cyber security: a story of provenance and lineage for malware

  • Authors:
  • Tudor Dumitras;Iulian Neamtiu

  • Affiliations:
  • Symantec Research Labs;University of California, Riverside

  • Venue:
  • CSET'11 Proceedings of the 4th conference on Cyber security experimentation and test
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Rigorous experiments and empirical studies hold the promise of empowering researchers and practitioners to develop better approaches for cyber security. For example, understanding the provenance and lineage of polymorphic malware strains can lead to new techniques for detecting and classifying unknown attacks. Unfortunately, many challenges stand in the way: the lack of sufficient field data (e.g., malware samples and contextual information about their impact in the real world), the lack of metadata about the collection process of the existing data sets, the lack of ground truth, the difficulty of developing tools and methods for rigorous data analysis. As a first step towards rigorous experimental methods, we introduce two techniques for reconstructing the phylogenetic trees and dynamic control-flow graphs of unknown binaries, inspired from research in software evolution, bioinformatics and time series analysis. Our approach is based on the observation that the long evolution histories of open source projects provide an opportunity for creating precise models of lineage and provenance, which can be used for detecting and clustering malware as well. As a second step, we present experimental methods that combine the use of a representative corpus of malware and contextual information (gathered from end hosts rather than from network traces or honeypots) with sound data collection and analysis techniques. While our experimental methods serve a concrete purpose-- understanding lineage and provenance--they also provide a general blueprint for addressing the threats to the validity of cyber security studies.