Weighted linear kernel with tree transformed features for malware detection

  • Authors:
  • Prakash Mandayam Comar;Lei Liu;Sabyasachi Saha;Antonio Nucci;Pang-Ning Tan

  • Affiliations:
  • Michigan State University, East Lansing, MI, USA;Michigan State University, East Lansing, MI, USA;Narus Inc, SunnyVale, CA, USA;Narus Inc, Sunnyvale, CA, USA;Michigan State University, East Lansing, MI, USA

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Malware detection from network traffic flows is a challenging problem due to data irregularity issues such as imbalanced class distribution, noise, missing values, and heterogeneous types of features. To address these challenges, this paper presents a two-stage classification approach for malware detection. The framework initially employs random forest as a macro-level classifier to separate the malicious from non-malicious network flows, followed by a collection of one-class support vector machine classifiers to identify the specific type of malware. A novel tree-based feature construction approach is proposed to deal with data imperfection issues. As the performance of the support vector machine classifier often depends on the kernel function used to compute the similarity between every pair of data points, designing an appropriate kernel is essential for accurate identification of malware classes. We present a simple algorithm to construct a weighted linear kernel on the tree transformed features and demonstrate its effectiveness in detecting malware from real network traffic data.