VILO: a rapid learning nearest-neighbor classifier for malware triage

  • Authors:
  • Arun Lakhotia;Andrew Walenstein;Craig Miles;Anshuman Singh

  • Affiliations:
  • Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, USA;School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, USA;Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, USA;Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, USA

  • Venue:
  • Journal in Computer Virology
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

VILO is a lazy learner system designed for malware classification and triage. It implements a nearest neighbor (NN) algorithm with similarities computed over Term Frequency $$\times $$ Inverse Document Frequency (TFIDF) weighted opcode mnemonic permutation features (N-perms). Being an NN-classifier, VILO makes minimal structural assumptions about class boundaries, and thus is well suited for the constantly changing malware population. This paper presents an extensive study of application of VILO in malware analysis. Our experiments demonstrate that (a) VILO is a rapid learner of malware families, i.e., VILO's learning curve stabilizes at high accuracies quickly (training on less than 20 variants per family is sufficient); (b) similarity scores derived from TDIDF weighted features should primarily be treated as ordinal measurements; and (c) VILO with N-perm feature vectors outperforms traditional N-gram feature vectors when used to classify real-world malware into their respective families.