Pattern recognition techniques for the classification of malware packers

Authors:
Li Sun;Steven Versteeg;Serdar Boztaş;Trevor Yann
Affiliations:
School of Mathematical and Geospatial Sciences, RMIT University, Melbourne, Australia;CA Labs, Melbourne, Australia;School of Mathematical and Geospatial Sciences, RMIT University, Melbourne, Australia;HCL Australia, Melbourne, Australia
Venue:
ACISP'10 Proceedings of the 15th Australasian conference on Information security and privacy
Year:
2010

Citing 16
Cited 3

Instance-Based Learning Algorithms

Machine Learning
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
C4.5: Programs for Machine Learning

C4.5: Programs for Machine Learning
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Induction of Decision Trees

Machine Learning
Data Mining Methods for Detection of New Malicious Executables

SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Learning to Detect and Classify Malicious Executables in the Wild

The Journal of Machine Learning Research
Classification of packed executables for accurate computer virus detection

Pattern Recognition Letters
Data mining methods for malware detection

Data mining methods for malware detection
Biologically inspired defenses against computer viruses

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Comparing files using structural entropy

Journal in Computer Virology
A static, packer-agnostic filter to detect similar malware samples

DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Detection of packed malware

Proceedings of the First International Conference on Security of Internet of Things

Quantified Score

Hi-index	0.00

Visualization

Abstract

Packing is the most common obfuscation method used by malware writers to hinder malware detection and analysis. There has been a dramatic increase in the number of new packers and variants of existing ones combined with packers employing increasingly sophisticated anti-unpacker tricks and obfuscation methods. This makes it difficult, costly and time-consuming for antivirus (AV) researchers to carry out the traditional static packer identification and classification methods which are mainly based on the packer's byte signature. In this paper1, we present a simple, yet fast and effective packer classification framework that applies pattern recognition techniques on automatically extracted randomness profiles of packers. This system can be run without AV researcher's manual input. We test various statistical classification algorithms, including k-Nearest Neighbor, Best-first Decision Tree, Sequential Minimal Optimization and Naive Bayes. We test these algorithms on a large data set that consists of clean packed files and 17,336 real malware samples. Experimental results demonstrate that our packer classification system achieves extremely high effectiveness ( 99%). The experiments also confirm that the randomness profile used in the system is a very strong feature for packer classification. It can be applied with high accuracy on real malware samples.