Opcode sequences as representation of executables for data-mining-based unknown malware detection

  • Authors:
  • Igor Santos;Felix Brezo;Xabier Ugarte-Pedrero;Pablo G. Bringas

  • Affiliations:
  • University of Deusto, Laboratory for Smartness, Semantics and Security (S3Lab), Avenida de las Universidades 24, 48007 Bilbao, Spain;University of Deusto, Laboratory for Smartness, Semantics and Security (S3Lab), Avenida de las Universidades 24, 48007 Bilbao, Spain;University of Deusto, Laboratory for Smartness, Semantics and Security (S3Lab), Avenida de las Universidades 24, 48007 Bilbao, Spain;University of Deusto, Laboratory for Smartness, Semantics and Security (S3Lab), Avenida de las Universidades 24, 48007 Bilbao, Spain

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.07

Visualization

Abstract

Malware can be defined as any type of malicious code that has the potential to harm a computer or network. The volume of malware is growing faster every year and poses a serious global security threat. Consequently, malware detection has become a critical topic in computer security. Currently, signature-based detection is the most widespread method used in commercial antivirus. In spite of the broad use of this method, it can detect malware only after the malicious executable has already caused damage and provided the malware is adequately documented. Therefore, the signature-based method consistently fails to detect new malware. In this paper, we propose a new method to detect unknown malware families. This model is based on the frequency of the appearance of opcode sequences. Furthermore, we describe a technique to mine the relevance of each opcode and assess the frequency of each opcode sequence. In addition, we provide empirical validation that this new method is capable of detecting unknown malware.