New malicious code detection using variable length n-grams

  • Authors:
  • D. Krishna Sandeep Reddy;Subrat Kumar Dash;Arun K. Pujari

  • Affiliations:
  • Artificial Intelligence Lab, University of Hyderabad, Hyderabad, India;Artificial Intelligence Lab, University of Hyderabad, Hyderabad, India;Artificial Intelligence Lab, University of Hyderabad, Hyderabad, India

  • Venue:
  • ICISS'06 Proceedings of the Second international conference on Information Systems Security
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most of the commercial antivirus software fail to detect unknown and new malicious code. In order to handle this problem generic virus detection is a viable option. Generic virus detector needs features that are common to viruses. Recently Kolter et al. [16] propose an efficient generic virus detector using n-grams as features. The fixed length n-grams used there suffer from the drawback that they cannot capture meaningful sequences of different lengths. In this paper we propose a new method of variable-length n-grams extraction based on the concept of episodes and demonstrate that they outperform fixed length n-grams in malicious code detection. The proposed algorithm requires only two scans over the whole data set whereas most of the classical algorithms require scans proportional to the maximum length of n-grams.