Opcode-sequence-based semi-supervised unknown malware detection

  • Authors:
  • Igor Santos;Borja Sanz;Carlos Laorden;Felix Brezo;Pablo G. Bringas

  • Affiliations:
  • S3Lab, DeustoTech - Computing, Deusto Institute of Technology University of Deusto, Bilbao, Spain;S3Lab, DeustoTech - Computing, Deusto Institute of Technology University of Deusto, Bilbao, Spain;S3Lab, DeustoTech - Computing, Deusto Institute of Technology University of Deusto, Bilbao, Spain;S3Lab, DeustoTech - Computing, Deusto Institute of Technology University of Deusto, Bilbao, Spain;S3Lab, DeustoTech - Computing, Deusto Institute of Technology University of Deusto, Bilbao, Spain

  • Venue:
  • CISIS'11 Proceedings of the 4th international conference on Computational intelligence in security for information systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Malware is any computer software potentially harmful to both computers and networks. The amount of malware is growing every year and poses a serious global security threat. Signature-based detection is the most extended method in commercial antivirus software, however, it consistently fails to detect new malware. Supervised machine learning has been adopted to solve this issue, but the usefulness of supervised learning is far to be complete because it requires a high amount of malicious executables and benign software to be identified and labelled previously. In this paper, we propose a new method of malware detection that adopts a well-known semi-supervised learning approach to detect unknown malware. This method is based on examining the frequencies of the appearance of opcode sequences to build a semi-supervised machine-learning classifier using a set of labelled (either malware or legitimate software) and unlabelled instances. We performed an empirical validation demonstrating that the labelling efforts are lower than when supervised learning is used while the system maintains high accuracy rate.