Weighted linear kernel with tree transformed features for malware detection

Authors:
Prakash Mandayam Comar;Lei Liu;Sabyasachi Saha;Antonio Nucci;Pang-Ning Tan
Affiliations:
Michigan State University, East Lansing, MI, USA;Michigan State University, East Lansing, MI, USA;Narus Inc, SunnyVale, CA, USA;Narus Inc, Sunnyvale, CA, USA;Michigan State University, East Lansing, MI, USA
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 2
Cited 0

Random Forests

Machine Learning
A new maximal-margin spherical-structured multi-class support vector machine

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Malware detection from network traffic flows is a challenging problem due to data irregularity issues such as imbalanced class distribution, noise, missing values, and heterogeneous types of features. To address these challenges, this paper presents a two-stage classification approach for malware detection. The framework initially employs random forest as a macro-level classifier to separate the malicious from non-malicious network flows, followed by a collection of one-class support vector machine classifiers to identify the specific type of malware. A novel tree-based feature construction approach is proposed to deal with data imperfection issues. As the performance of the support vector machine classifier often depends on the kernel function used to compute the similarity between every pair of data points, designing an appropriate kernel is essential for accurate identification of malware classes. We present a simple algorithm to construct a weighted linear kernel on the tree transformed features and demonstrate its effectiveness in detecting malware from real network traffic data.