Malware detection using adaptive data compression

Authors:
Yan Zhou;W. Meador Inge
Affiliations:
University of South Alabama, Mobile, AL, USA;University of South Alabama, Mobile, AL, USA
Venue:
Proceedings of the 1st ACM workshop on Workshop on AISec
Year:
2008

Citing 12
Cited 5

Data compression using dynamic Markov modelling

The Computer Journal
Arithmetic coding for data compression

Communications of the ACM
Text Categorization Using Compression Models

DCC '00 Proceedings of the Conference on Data Compression
PPM Model Cleaning

DCC '03 Proceedings of the Conference on Data Compression
Data Mining Methods for Detection of New Malicious Executables

SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Adaptive Spam Filtering Using Dynamic Feature Space

ICTAI '05 Proceedings of the 17th IEEE International Conference on Tools with Artificial Intelligence
Deobfuscation: Reverse Engineering Obfuscated Code

WCRE '05 Proceedings of the 12th Working Conference on Reverse Engineering
PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware

ACSAC '06 Proceedings of the 22nd Annual Computer Security Applications Conference
Spam Filtering Using Statistical Data Compression Models

The Journal of Machine Learning Research
Learning to Detect and Classify Malicious Executables in the Wild

The Journal of Machine Learning Research
A Multiple Instance Learning Strategy for Combating Good Word Attacks on Spam Filters

The Journal of Machine Learning Research
Biologically inspired defenses against computer viruses

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1

Compression for anti-adversarial learning

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Opcode-sequence-based semi-supervised unknown malware detection

CISIS'11 Proceedings of the 4th international conference on Computational intelligence in security for information systems
A layered classification for malicious function identification and malware detection

Concurrency and Computation: Practice & Experience
Opcode sequences as representation of executables for data-mining-based unknown malware detection

Information Sciences: an International Journal
Detecting machine-morphed malware variants via engine attribution

Journal in Computer Virology

Quantified Score

Hi-index	0.00

Visualization

Abstract

A popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of existing malware, are extracted by malware analysts from known malware samples, and stored in a database often referred to as a virus dictionary. This process often involves a significant amount of human efforts. In addition, there are two major limitations in this technique. First, not all malicious programs have bit patterns that are evidence of their malicious nature. Therefore, some malware is not recorded in the virus dictionary and can not be detected through signature matching. Second, searching for specific bit patterns will not work on malware that can take many forms--obfuscated malware. Signature matching has been shown to be incapable of identifying new malware patterns and fails to recognize obfuscated malware. This paper presents a malware detection technique that discovers malware by means of a learning engine trained on a set of malware instances and a set of benign code instances. The learning engine uses an adaptive data compression model--prediction by partial matching (PPM)--to build two compression models, one from the malware instances and the other from the benign code instances. A code instance is classified, either as "malware" or "benign", by minimizing its estimated cross entropy. Our preliminary results are very promising. We achieved about 0.94 true positive rate with as low as 0.016 false positive rate. Our experiments also demonstrate that this technique can effectively detect unknown and obfuscated malware.