Automatic malware categorization using cluster ensemble

  • Authors:
  • Yanfang Ye;Tao Li;Yong Chen;Qingshan Jiang

  • Affiliations:
  • Xiamen University, Xiamen, China;Florida International University, Miami, FL, USA;Kingsoft Corporation, Zhuhai, China;Xiamen University, Xiamen, China

  • Venue:
  • Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, resting on the analysis of instruction frequency and function-based instruction sequences, we develop an Automatic Malware Categorization System (AMCS) for automatically grouping malware samples into families that share some common characteristics using a cluster ensemble by aggregating the clustering solutions generated by different base clustering algorithms. We propose a principled cluster ensemble framework for combining individual clustering solutions based on the consensus partition. The domain knowledge in the form of sample-level constraints can be naturally incorporated in the ensemble framework. In addition, to account for the characteristics of feature representations, we propose a hybrid hierarchical clustering algorithm which combines the merits of hierarchical clustering and k-medoids algorithms and a weighted subspace K-medoids algorithm to generate base clusterings. The categorization results of our AMCS system can be used to generate signatures for malware families that are useful for malware detection. The case studies on large and real daily malware collection from Kingsoft Anti-Virus Lab demonstrate the effectiveness and efficiency of our AMCS system.