Embedded Malware Detection Using Markov n-Grams

  • Authors:
  • M. Zubair Shafiq;Syed Ali Khayam;Muddassar Farooq

  • Affiliations:
  • Next Generation Intelligent Networks Research Center (nexGINRC), National University of Computer & Emerging Sciences (NUCES), Islamabad, Pakistan;School of Electrical Engineering & Computer Science (SEECS), National University of Sciences & Technology (NUST), Rawalpindi, Pakistan;Next Generation Intelligent Networks Research Center (nexGINRC), National University of Computer & Emerging Sciences (NUCES), Islamabad, Pakistan

  • Venue:
  • DIMVA '08 Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Embedded malware is a recently discovered security threat that allows malcode to be hidden inside a benign file. It has been shown that embedded malware is not detected by commercial antivirus software even when the malware signature is present in the antivirus database. In this paper, we present a novel anomaly detection scheme to detect embedded malware. We first analyze byte sequences in benign files to show that benign files' data generally exhibit a 1-st order dependence structure. Consequently, conditional n-grams provide a more meaningful representation of a file's statistical properties than traditional n-grams. To capture and leverage this correlation structure for embedded malware detection, we model the conditional distributions as Markov n-grams. For embedded malware detection, we use an information-theoretic measure, called entropy rate, to quantify changes in Markov n-gram distributions observed in a file. We show that the entropy rate of Markov n-grams gets significantly perturbed at malcode embedding locations, and therefore can act as a robust feature for embedded malware detection. We evaluate the proposed Markov n-gram detector on a comprehensive malware dataset consisting of more than 37,000 malware samples and 1,800 benign samples of six well-known filetypes. We show that the Markov n-gram detector provides better detection and false positive rates than the only existing embedded malware detection scheme.