Knowledge discovery in databases: an overview
AI Magazine
Principles of data mining
Protecting Software Code by Guards
DRM '01 Revised Papers from the ACM CCS-8 Workshop on Security and Privacy in Digital Rights Management
Obfuscation of executable code to improve resistance to static disassembly
Proceedings of the 10th ACM conference on Computer and communications security
Detection of injected, dynamically generated, and obfuscated malicious code
Proceedings of the 2003 ACM workshop on Rapid malcode
Static Analyzer of Vicious Executables (SAVE)
ACSAC '04 Proceedings of the 20th Annual Computer Security Applications Conference
Host-based detection of worms through peer-to-peer cooperation
Proceedings of the 2005 ACM workshop on Rapid malcode
On the optimality of Naïve Bayes with dependent binary features
Pattern Recognition Letters
Learning to Detect and Classify Malicious Executables in the Wild
The Journal of Machine Learning Research
IMDS: intelligent malware detection system
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Eureka: A Framework for Enabling Static Malware Analysis
ESORICS '08 Proceedings of the 13th European Symposium on Research in Computer Security: Computer Security
Detecting Java Theft Based on Static API Trace Birthmark
IWSEC '08 Proceedings of the 3rd International Workshop on Security: Advances in Information and Computer Security
A static API birthmark for Windows binary executables
Journal of Systems and Software
Malware Detection Based on Suspicious Behavior Identification
ETCS '09 Proceedings of the 2009 First International Workshop on Education Technology and Computer Science - Volume 02
CIMDS: adapting postprocessing techniques of associative classification for malware detection
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Towards Understanding Malware Behaviour by the Extraction of API Calls
CTC '10 Proceedings of the 2010 Second Cybercrime and Trustworthy Computing Workshop
Data mining for credit card fraud: A comparative study
Decision Support Systems
Data Mining: Practical Machine Learning Tools and Techniques
Data Mining: Practical Machine Learning Tools and Techniques
Detecting self-mutating malware using control-flow graph matching
DIMVA'06 Proceedings of the Third international conference on Detection of Intrusions and Malware & Vulnerability Assessment
Detecting malicious behaviour using supervised learning algorithms of the function calls
International Journal of Electronic Security and Digital Forensics
Hi-index | 0.00 |
Zero-day or unknown malware are created using code obfuscation techniques that can modify the parent code to produce offspring copies which have the same functionality but with different signatures. Current techniques reported in literature lack the capability of detecting zero-day malware with the required accuracy and efficiency. In this paper, we have proposed and evaluated a novel method of employing several data mining techniques to detect and classify zero-day malware with high levels of accuracy and efficiency based on the frequency of Windows API calls. This paper describes the methodology employed for the collection of large data sets to train the classifiers, and analyses the performance results of the various data mining algorithms adopted for the study using a fully automated tool developed in this research to conduct the various experimental investigations and evaluation. Through the performance results of these algorithms from our experimental analysis, we are able to evaluate and discuss the advantages of one data mining algorithm over the other for accurately detecting zero-day malware successfully. The data mining framework employed in this research learns through analysing the behavior of existing malicious and benign codes in large datasets. We have employed robust classifiers, namely Naïve Bayes (NB) Algorithm, k--Nearest Neighbor (kNN) Algorithm, Sequential Minimal Optimization (SMO) Algorithm with 4 differents kernels (SMO - Normalized PolyKernel, SMO -- PolyKernel, SMO -- Puk, and SMO- Radial Basis Function (RBF)), Backpropagation Neural Networks Algorithm, and J48 decision tree and have evaluated their performance. Overall, the automated data mining system implemented for this study has achieved high true positive (TP) rate of more than 98.5%, and low false positive (FP) rate of less than 0.025, which has not been achieved in literature so far. This is much higher than the required commercial acceptance level indicating that our novel technique is a major leap forward in detecting zero-day malware. This paper also offers future directions for researchers in exploring different aspects of obfuscations that are affecting the IT world today.