Opcode sequences as representation of executables for data-mining-based unknown malware detection

Authors:
Igor Santos;Felix Brezo;Xabier Ugarte-Pedrero;Pablo G. Bringas
Affiliations:
University of Deusto, Laboratory for Smartness, Semantics and Security (S3Lab), Avenida de las Universidades 24, 48007 Bilbao, Spain;University of Deusto, Laboratory for Smartness, Semantics and Security (S3Lab), Avenida de las Universidades 24, 48007 Bilbao, Spain;University of Deusto, Laboratory for Smartness, Semantics and Security (S3Lab), Avenida de las Universidades 24, 48007 Bilbao, Spain;University of Deusto, Laboratory for Smartness, Semantics and Security (S3Lab), Avenida de las Universidades 24, 48007 Bilbao, Spain
Venue:
Information Sciences: an International Journal
Year:
2013

Citing 54
Cited 3

A Distance-Based Attribute Selection Measure for Decision Tree Induction

Machine Learning
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Data preparation for data mining

Data preparation for data mining
Improving support vector machine classifiers by modifying kernal functions

Neural Networks
A vector space model for automatic indexing

Communications of the ACM
Expert Systems and Probabiistic Network Models

Expert Systems and Probabiistic Network Models
Instance Selection and Construction for Data Mining

Instance Selection and Construction for Data Mining
Random Forests

Machine Learning
Induction of Decision Trees

Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Static Analysis of Binary Code to Isolate Malicious Behaviors

WETICE '99 Proceedings of the 8th Workshop on Enabling Technologies on Infrastructure for Collaborative Enterprises
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Data Mining Methods for Detection of New Malicious Executables

SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Feature extraction by non parametric mutual information maximization

The Journal of Machine Learning Research
Consistency-based search in feature selection

Artificial Intelligence
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Testing malware detectors

ISSTA '04 Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
Learning to detect malicious executables in the wild

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Static Analyzer of Vicious Executables (SAVE)

ACSAC '04 Proceedings of the 20th Annual Computer Security Applications Conference
The Art of Computer Virus Research and Defense

The Art of Computer Virus Research and Defense
Polymorphic Malicious Executable Scanner by API Sequence Analysis

HIS '04 Proceedings of the Fourth International Conference on Hybrid Intelligent Systems
Semantics-Aware Malware Detection

SP '05 Proceedings of the 2005 IEEE Symposium on Security and Privacy
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Using engine signature to detect metamorphic malware

Proceedings of the 4th ACM workshop on Recurring malcode
PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware

ACSAC '06 Proceedings of the 22nd Annual Computer Security Applications Conference
Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series)

Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series)
Static analysis of executables to detect malicious patterns

SSYM'03 Proceedings of the 12th conference on USENIX Security Symposium - Volume 12
Renovo: a hidden code extractor for packed executables

Proceedings of the 2007 ACM workshop on Recurring malcode
Behavior-based malware detection

Behavior-based malware detection
Opcodes as predictor for malware

International Journal of Electronic Security and Digital Forensics
Embedded Malware Detection Using Markov n-Grams

DIMVA '08 Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Malware detection using adaptive data compression

Proceedings of the 1st ACM workshop on Workshop on AISec
Eureka: A Framework for Enabling Static Malware Analysis

ESORICS '08 Proceedings of the 13th European Symposium on Research in Computer Security: Computer Security
Multiple criteria mathematical programming for multi-class classification and application in network intrusion detection

Information Sciences: an International Journal
McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables

ACSAC '08 Proceedings of the 2008 Annual Computer Security Applications Conference
Unknown Malcode Detection Using OPCODE Representation

EuroISI '08 Proceedings of the 1st European Conference on Intelligence and Security Informatics
Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey

Information Security Tech. Report
Comparative analysis of regression and machine learning methods for predicting fault proneness models

International Journal of Computer Applications in Technology
Supervised Machine Learning: A Review of Classification Techniques

Proceedings of the 2007 conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies
A First Study on the Use of Coevolutionary Algorithms for Instance and Feature Selection

HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
Review: Intrusion detection by machine learning: A review

Expert Systems with Applications: An International Journal
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining

Applied Soft Computing
On the versatility of radial basis function neural networks: A case study in the field of intrusion detection

Information Sciences: an International Journal
Semi-Supervised Learning

Semi-Supervised Learning
Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles

BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
Detecting self-mutating malware using control-flow graph matching

DIMVA'06 Proceedings of the Third international conference on Detection of Intrusions and Malware & Vulnerability Assessment
Idea: opcode-sequence-based malware detection

ESSoS'10 Proceedings of the Second international conference on Engineering Secure Software and Systems
OFFSS: optimal fuzzy-valued feature subset selection

IEEE Transactions on Fuzzy Systems
On the concept of software obfuscation in computer security

ISC'07 Proceedings of the 10th international conference on Information Security
Shielding wireless sensor network using Markovian intrusion detection system with attack pattern mining

Information Sciences: an International Journal

Editorial: Guest editorial: Special issue on data mining for information security

Information Sciences: an International Journal
Malware detection by pruning of parallel ensembles using harmony search

Pattern Recognition Letters
A new adaptive decentralized soft decision combining rule for distributed sensor systems with data fusion

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

Malware can be defined as any type of malicious code that has the potential to harm a computer or network. The volume of malware is growing faster every year and poses a serious global security threat. Consequently, malware detection has become a critical topic in computer security. Currently, signature-based detection is the most widespread method used in commercial antivirus. In spite of the broad use of this method, it can detect malware only after the malicious executable has already caused damage and provided the malware is adequately documented. Therefore, the signature-based method consistently fails to detect new malware. In this paper, we propose a new method to detect unknown malware families. This model is based on the frequency of the appearance of opcode sequences. Furthermore, we describe a technique to mine the relevance of each opcode and assess the frequency of each opcode sequence. In addition, we provide empirical validation that this new method is capable of detecting unknown malware.