Zero-day malware detection based on supervised learning algorithms of API call signatures

Authors:
Mamoun Alazab;Sitalakshmi Venkatraman;Paul Watters;Moutaz Alazab
Affiliations:
University of Ballarat;University of Ballarat;University of Ballarat;Deakin University, Australia
Venue:
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Year:
2011

Citing 19
Cited 1

Knowledge discovery in databases: an overview

AI Magazine
Principles of data mining

Principles of data mining
Protecting Software Code by Guards

DRM '01 Revised Papers from the ACM CCS-8 Workshop on Security and Privacy in Digital Rights Management
Obfuscation of executable code to improve resistance to static disassembly

Proceedings of the 10th ACM conference on Computer and communications security
Detection of injected, dynamically generated, and obfuscated malicious code

Proceedings of the 2003 ACM workshop on Rapid malcode
Static Analyzer of Vicious Executables (SAVE)

ACSAC '04 Proceedings of the 20th Annual Computer Security Applications Conference
Host-based detection of worms through peer-to-peer cooperation

Proceedings of the 2005 ACM workshop on Rapid malcode
On the optimality of Naïve Bayes with dependent binary features

Pattern Recognition Letters
Learning to Detect and Classify Malicious Executables in the Wild

The Journal of Machine Learning Research
IMDS: intelligent malware detection system

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Eureka: A Framework for Enabling Static Malware Analysis

ESORICS '08 Proceedings of the 13th European Symposium on Research in Computer Security: Computer Security
Detecting Java Theft Based on Static API Trace Birthmark

IWSEC '08 Proceedings of the 3rd International Workshop on Security: Advances in Information and Computer Security
A static API birthmark for Windows binary executables

Journal of Systems and Software
Malware Detection Based on Suspicious Behavior Identification

ETCS '09 Proceedings of the 2009 First International Workshop on Education Technology and Computer Science - Volume 02
CIMDS: adapting postprocessing techniques of associative classification for malware detection

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Towards Understanding Malware Behaviour by the Extraction of API Calls

CTC '10 Proceedings of the 2010 Second Cybercrime and Trustworthy Computing Workshop
Data mining for credit card fraud: A comparative study

Decision Support Systems
Data Mining: Practical Machine Learning Tools and Techniques

Data Mining: Practical Machine Learning Tools and Techniques
Detecting self-mutating malware using control-flow graph matching

DIMVA'06 Proceedings of the Third international conference on Detection of Intrusions and Malware & Vulnerability Assessment

Detecting malicious behaviour using supervised learning algorithms of the function calls

International Journal of Electronic Security and Digital Forensics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Zero-day or unknown malware are created using code obfuscation techniques that can modify the parent code to produce offspring copies which have the same functionality but with different signatures. Current techniques reported in literature lack the capability of detecting zero-day malware with the required accuracy and efficiency. In this paper, we have proposed and evaluated a novel method of employing several data mining techniques to detect and classify zero-day malware with high levels of accuracy and efficiency based on the frequency of Windows API calls. This paper describes the methodology employed for the collection of large data sets to train the classifiers, and analyses the performance results of the various data mining algorithms adopted for the study using a fully automated tool developed in this research to conduct the various experimental investigations and evaluation. Through the performance results of these algorithms from our experimental analysis, we are able to evaluate and discuss the advantages of one data mining algorithm over the other for accurately detecting zero-day malware successfully. The data mining framework employed in this research learns through analysing the behavior of existing malicious and benign codes in large datasets. We have employed robust classifiers, namely Naïve Bayes (NB) Algorithm, k--Nearest Neighbor (kNN) Algorithm, Sequential Minimal Optimization (SMO) Algorithm with 4 differents kernels (SMO - Normalized PolyKernel, SMO -- PolyKernel, SMO -- Puk, and SMO- Radial Basis Function (RBF)), Backpropagation Neural Networks Algorithm, and J48 decision tree and have evaluated their performance. Overall, the automated data mining system implemented for this study has achieved high true positive (TP) rate of more than 98.5%, and low false positive (FP) rate of less than 0.025, which has not been achieved in literature so far. This is much higher than the required commercial acceptance level indicating that our novel technique is a major leap forward in detecting zero-day malware. This paper also offers future directions for researchers in exploring different aspects of obfuscations that are affecting the IT world today.