New malicious code detection using variable length n-grams

Authors:
D. Krishna Sandeep Reddy;Subrat Kumar Dash;Arun K. Pujari
Affiliations:
Artificial Intelligence Lab, University of Hyderabad, Hyderabad, India;Artificial Intelligence Lab, University of Hyderabad, Hyderabad, India;Artificial Intelligence Lab, University of Hyderabad, Hyderabad, India
Venue:
ICISS'06 Proceedings of the Second international conference on Information Systems Security
Year:
2006

Citing 17
Cited 5

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Characterizing the behavior of a program using multiple-length N-grams

Proceedings of the 2000 workshop on New security paradigms
Machine Learning

Machine Learning
Attacking Malicious Code: A Report to the Infosec Research Council

IEEE Software
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
MEF: Malicious Email Filter - A UNIX Mail Filter That Detects Malicious Windows Executables

Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference
An Unsupervised Algorithm for Segmenting Categorical Timeseries into Episodes

Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Segmenting time series with a hybrid neural networks - hidden Markov model

Eighteenth national conference on Artificial intelligence
Data Mining Methods for Detection of New Malicious Executables

SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Learning to detect malicious executables in the wild

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
The Art of Computer Virus Research and Defense

The Art of Computer Virus Research and Defense
Multi-resolution Abnormal Trace Detection Using Varied-length N-grams and Automata

ICAC '05 Proceedings of the Second International Conference on Automatic Computing
Static analysis of executables to detect malicious patterns

SSYM'03 Proceedings of the 12th conference on USENIX Security Symposium - Volume 12
Detecting targeted attacks using shadow honeypots

SSYM'05 Proceedings of the 14th conference on USENIX Security Symposium - Volume 14
Fixed- vs. variable-length patterns for detecting suspicious process behavior

Journal of Computer Security
Biologically inspired defenses against computer viruses

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Episode based masquerade detection

ICISS'05 Proceedings of the First international conference on Information Systems Security

Graph-based malware detection using dynamic analysis

Journal in Computer Virology
FORECAST: skimming off the malware cream

Proceedings of the 27th Annual Computer Security Applications Conference
A graph mining approach for detecting unknown malwares

Journal of Visual Languages and Computing
Using file relationships in malware classification

DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
A static, packer-agnostic filter to detect similar malware samples

DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of the commercial antivirus software fail to detect unknown and new malicious code. In order to handle this problem generic virus detection is a viable option. Generic virus detector needs features that are common to viruses. Recently Kolter et al. [16] propose an efficient generic virus detector using n-grams as features. The fixed length n-grams used there suffer from the drawback that they cannot capture meaningful sequences of different lengths. In this paper we propose a new method of variable-length n-grams extraction based on the concept of episodes and demonstrate that they outperform fixed length n-grams in malicious code detection. The proposed algorithm requires only two scans over the whole data set whereas most of the classical algorithms require scans proportional to the maximum length of n-grams.