Using file relationships in malware classification

  • Authors:
  • Nikos Karampatziakis;Jack W. Stokes;Anil Thomas;Mady Marinescu

  • Affiliations:
  • Microsoft Corporation, Redmond, WA;Microsoft Corporation, Redmond, WA;Microsoft Corporation, Redmond, WA;Microsoft Corporation, Redmond, WA

  • Venue:
  • DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Typical malware classification methods analyze unknown files in isolation. However, this ignores valuable relationships between malware files, such as containment in a zip archive, dropping, or downloading. We present a new malware classification system based on a graph induced by file relationships, and, as a proof of concept, analyze containment relationships, for which we have much available data. However our methodology is general, relying only on an initial estimate for some of the files in our data and on propagating information along the edges of the graph. It can thus be applied to other types of file relationships. We show that since malicious files are often included in multiple malware containers, the system's detection accuracy can be significantly improved, particularly at low false positive rates which are the main operating points for automated malware classifiers. For example at a false positive rate of 0.2%, the false negative rate decreases from 42.1% to 15.2%. Finally, the new system is highly scalable; our basic implementation can learn good classifiers from a large, bipartite graph including over 719 thousand containers and 3.4 million files in a total of 16 minutes.