Mal-ID: automatic malware detection using common segment analysis and meta-features

Authors:
Gil Tahan;Lior Rokach;Yuval Shahar
Affiliations:
Department of Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel;Department of Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel;Department of Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
Venue:
The Journal of Machine Learning Research
Year:
2012

Citing 27
Cited 0

Evidential reasoning using stochastic simulation of causal models

Artificial Intelligence
Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Making large-scale support vector machine learning practical

Advances in kernel methods
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Machine Learning

Machine Learning
Random Forests

Machine Learning
Classification by Voting Feature Intervals

ECML '97 Proceedings of the 9th European Conference on Machine Learning
Data Mining Methods for Detection of New Malicious Executables

SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Learning to detect malicious executables in the wild

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
N-Gram-Based Detection of New Malicious Code

COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts - Volume 02
KDD-Cup 2004: results and analysis

ACM SIGKDD Explorations Newsletter
Polygraph: Automatically Generating Signatures for Polymorphic Worms

SP '05 Proceedings of the 2005 IEEE Symposium on Security and Privacy
Rotation Forest: A New Classifier Ensemble Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Feature Selection and Evaluation Scheme for Computer Virus Detection

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining

Data Mining
Learning to Detect and Classify Malicious Executables in the Wild

The Journal of Machine Learning Research
Negation recognition in medical narrative reports

Information Retrieval
Applying Machine Learning Techniques for Detection of Malicious Code in Network Traffic

KI '07 Proceedings of the 30th annual German conference on Advances in Artificial Intelligence
Improving malware detection by applying multi-inducer ensemble

Computational Statistics & Data Analysis
A brief introduction to boosting

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Ensemble-based classifiers

Artificial Intelligence Review
Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization

The Journal of Machine Learning Research
Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list

Journal of Intelligent Information Systems
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Malicious codes detection based on ensemble learning

ATC'07 Proceedings of the 4th international conference on Autonomic and Trusted Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes several novel methods, based on machine learning, to detect malware in executable files without any need for preprocessing, such as unpacking or disassembling. The basic method (Mal-ID) is a new static (form-based) analysis methodology that uses common segment analysis in order to detect malware files. By using common segment analysis, Mal-ID is able to discard malware parts that originate from benign code. In addition, Mal-ID uses a new kind of feature, termed meta-feature, to better capture the properties of the analyzed segments. Rather than using the entire file, as is usually the case with machine learning based techniques, the new approach detects malware on the segment level. This study also introduces two Mal-ID extensions that improve the Mal-ID basic method in various aspects. We rigorously evaluated Mal-ID and its two extensions with more than ten performance measures, and compared them to the highly rated boosted decision tree method under identical settings. The evaluation demonstrated that Mal-ID and the two Mal-ID extensions outperformed the boosted decision tree method in almost all respects. In addition, the results indicated that by extracting meaningful features, it is sufficient to employ one simple detection rule for classifying executable files.