Data mining methods for malware detection using instruction sequences

Authors:
Muazzam Siddiqui;Morgan C. Wang;Joohan Lee
Affiliations:
University of Central Florida;University of Central Florida;University of Central Florida
Venue:
AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
Year:
2008

Citing 9
Cited 2

A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Toolkit for Detecting and Analyzing Malicious Software

ACSAC '02 Proceedings of the 18th Annual Computer Security Applications Conference
Data Mining Methods for Detection of New Malicious Executables

SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Detection of injected, dynamically generated, and obfuscated malicious code

Proceedings of the 2003 ACM workshop on Rapid malcode
Learning to detect malicious executables in the wild

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
N-Gram-Based Detection of New Malicious Code

COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts - Volume 02
Static Analyzer of Vicious Executables (SAVE)

ACSAC '04 Proceedings of the 20th Annual Computer Security Applications Conference
The Art of Computer Virus Research and Defense

The Art of Computer Virus Research and Defense
Static analysis of executables to detect malicious patterns

SSYM'03 Proceedings of the 12th conference on USENIX Security Symposium - Volume 12

A comparative study of malware family classification

ICICS'12 Proceedings of the 14th international conference on Information and Communications Security
Review: Classification of malware based on integrated static and dynamic features

Journal of Network and Computer Applications

Quantified Score

Hi-index	0.02

Visualization

Abstract

Malicious programs pose a serious threat to computer security. Traditional approaches using signatures to detect malicious programs pose little danger to new and unseen programs whose signatures are not available. The focus of the research is shifting from using signature patterns to identify a specific malicious program and/or its variants to discover the general malicious behavior in the programs. This paper presents a novel idea of automatically identifying critical instruction sequences that can classify between malicious and clean programs using data mining techniques. Based upon general statistics gathered from these instruction sequences we formulated the problem as a binary classification problem and built logistic regression, neural networks and decision tree models. Our approach showed 98.4% detection rate on new programs whose data was not used in the model building process.