Unknown Malcode Detection Using OPCODE Representation

Authors:
Robert Moskovitch;Clint Feher;Nir Tzachar;Eugene Berger;Marina Gitelman;Shlomi Dolev;Yuval Elovici
Affiliations:
Deutsche Telekom Laboratories at Ben Gurion University, Ben Gurion University, Be'er Sheva, Israel 84105;Deutsche Telekom Laboratories at Ben Gurion University, Ben Gurion University, Be'er Sheva, Israel 84105;Deutsche Telekom Laboratories at Ben Gurion University, Ben Gurion University, Be'er Sheva, Israel 84105;Deutsche Telekom Laboratories at Ben Gurion University, Ben Gurion University, Be'er Sheva, Israel 84105;Deutsche Telekom Laboratories at Ben Gurion University, Ben Gurion University, Be'er Sheva, Israel 84105;Deutsche Telekom Laboratories at Ben Gurion University, Ben Gurion University, Be'er Sheva, Israel 84105;Deutsche Telekom Laboratories at Ben Gurion University, Ben Gurion University, Be'er Sheva, Israel 84105
Venue:
EuroISI '08 Proceedings of the 1st European Conference on Intelligence and Security Informatics
Year:
2008

Citing 14
Cited 7

C4.5: programs for machine learning

C4.5: programs for machine learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
A vector space model for automatic indexing

Communications of the ACM
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Machine Learning

Machine Learning
Data Mining Methods for Detection of New Malicious Executables

SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning to detect malicious executables in the wild

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
N-Gram-Based Detection of New Malicious Code

COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts - Volume 02
Malware prevalence in the KaZaA file-sharing network

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
A Feature Selection and Evaluation Scheme for Computer Virus Detection

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Learning to Detect and Classify Malicious Executables in the Wild

The Journal of Machine Learning Research
A brief introduction to boosting

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2

Malicious Code Detection Using Active Learning

Privacy, Security, and Trust in KDD
On deployable adversarial classification models

Proceedings of the 4th ACM workshop on Security and artificial intelligence
Feature reduction to speed up malware classification

NordSec'11 Proceedings of the 16th Nordic conference on Information Security Technology for Applications
Tracking concept drift in malware families

Proceedings of the 5th ACM workshop on Security and artificial intelligence
Opcode sequences as representation of executables for data-mining-based unknown malware detection

Information Sciences: an International Journal
VILO: a rapid learning nearest-neighbor classifier for malware triage

Journal in Computer Virology
POSTER: Detecting malware through temporal function-based features

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recent growth in network usage has motivated the creation of new malicious code for various purposes, including economic ones. Today's signature-based anti-viruses are very accurate, but cannot detect new malicious code. Recently, classification algorithms were employed successfully for the detection of unknown malicious code. However, most of the studies use byte sequence n-grams representation of the binary code of the executables. We propose the use of (Operation Code) OpCodes, generated by disassembling the executables. We then use n-grams of the OpCodes as features for the classification process. We present a full methodology for the detection of unknown malicious code, based on text categorization concepts. We performed an extensive evaluation of a test collection of more than 30,000 files, in which we evaluated extensively the OpCode n-gram representation and investigated the imbalance problem, referring to real-life scenarios, in which the malicious file content is expected to be about 10% of the total files. Our results indicate that greater than 99% accuracy can be achieved through the use of a training set that has a malicious file percentage lower than 15%, which is higher than in our previous experience with byte sequence n-gram representation [1].