Exploring discriminatory features for automated malware classification

  • Authors:
  • Guanhua Yan;Nathan Brown;Deguang Kong

  • Affiliations:
  • Information Sciences (CCS-3), Los Alamos National Laboratory;Department of Electrical and Computer Engineering, Naval Postgraduate School;Department of Computer Science, University of Texas, Arlington

  • Venue:
  • DIMVA'13 Proceedings of the 10th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ever-growing malware threat in the cyber space calls for techniques that are more effective than widely deployed signature-based detection systems and more scalable than manual reverse engineering by forensic experts. To counter large volumes of malware variants, machine learning techniques have been applied recently for automated malware classification. Despite the successes made from these efforts, we still lack a basic understanding of some key issues, such as what features we should use and which classifiers perform well on malware data. Against this backdrop, the goal of this work is to explore discriminatory features for automated malware classification. We conduct a systematic study on the discriminative power of various types of features extracted from malware programs, and experiment with different combinations of feature selection algorithms and classifiers. Our results not only offer insights into what features most distinguish malware families, but also shed light on how to develop scalable techniques for automated malware classification in practice.