Malicious Code Detection Using Active Learning

Authors:
Robert Moskovitch;Nir Nissim;Yuval Elovici
Affiliations:
Deutsche Telekom Laboratories at Ben Gurion University, Ben Gurion University, Beer Sheva, Israel 84105;Deutsche Telekom Laboratories at Ben Gurion University, Ben Gurion University, Beer Sheva, Israel 84105;Deutsche Telekom Laboratories at Ben Gurion University, Ben Gurion University, Beer Sheva, Israel 84105
Venue:
Privacy, Security, and Trust in KDD
Year:
2009

Citing 16
Cited 1

A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical

Advances in kernel methods
A vector space model for automatic indexing

Communications of the ACM
Machine Learning

Machine Learning
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Queries and Concept Learning

Machine Learning
Queries and Concept Learning

Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Data Mining Methods for Detection of New Malicious Executables

SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Learning to detect malicious executables in the wild

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
N-Gram-Based Detection of New Malicious Code

COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts - Volume 02
Malware prevalence in the KaZaA file-sharing network

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
A Feature Selection and Evaluation Scheme for Computer Virus Detection

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Learning to Detect and Classify Malicious Executables in the Wild

The Journal of Machine Learning Research
Unknown Malcode Detection Using OPCODE Representation

EuroISI '08 Proceedings of the 1st European Conference on Intelligence and Security Informatics

Practical experiences with purenet, a self-learning malware prevention system

iNetSec'10 Proceedings of the 2010 IFIP WG 11.4 international conference on Open research problems in network security

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recent growth in network usage has motivated the creation of new malicious code for various purposes, including economic and other malicious purposes. Currently, dozens of new malicious codes are created every day and this number is expected to increase in the coming years. Today's signature-based anti-viruses and heuristic-based methods are accurate, but cannot detect new malicious code. Recently, classification algorithms were used successfully for the detection of malicious code. We present a complete methodology for the detection of unknown malicious code, inspired by text categorization concepts. However, this approach can be exploited further to achieve a more accurate and efficient acquisition method of unknown malicious files. We use an Active-Learning framework that enables the selection of the unknown files for fast acquisition. We performed an extensive evaluation of a test collection consisting of more than 30,000 files. We present a rigorous evaluation setup, consisting of real-life scenarios, in which the malicious file content is expected to be low, at about 10% of the files in the stream. We define specific evaluation measures based on the known precision and recall measures, which show the accuracy of the acquisition process and the improvement in the classifier resulting from the efficient acquisition process.