Data compression using dynamic Markov modelling
The Computer Journal
Arithmetic coding for data compression
Communications of the ACM
Text Categorization Using Compression Models
DCC '00 Proceedings of the Conference on Data Compression
DCC '03 Proceedings of the Conference on Data Compression
Data Mining Methods for Detection of New Malicious Executables
SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Adaptive Spam Filtering Using Dynamic Feature Space
ICTAI '05 Proceedings of the 17th IEEE International Conference on Tools with Artificial Intelligence
Deobfuscation: Reverse Engineering Obfuscated Code
WCRE '05 Proceedings of the 12th Working Conference on Reverse Engineering
PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware
ACSAC '06 Proceedings of the 22nd Annual Computer Security Applications Conference
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
Learning to Detect and Classify Malicious Executables in the Wild
The Journal of Machine Learning Research
A Multiple Instance Learning Strategy for Combating Good Word Attacks on Spam Filters
The Journal of Machine Learning Research
Biologically inspired defenses against computer viruses
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Compression for anti-adversarial learning
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Opcode-sequence-based semi-supervised unknown malware detection
CISIS'11 Proceedings of the 4th international conference on Computational intelligence in security for information systems
A layered classification for malicious function identification and malware detection
Concurrency and Computation: Practice & Experience
Opcode sequences as representation of executables for data-mining-based unknown malware detection
Information Sciences: an International Journal
Detecting machine-morphed malware variants via engine attribution
Journal in Computer Virology
Hi-index | 0.00 |
A popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of existing malware, are extracted by malware analysts from known malware samples, and stored in a database often referred to as a virus dictionary. This process often involves a significant amount of human efforts. In addition, there are two major limitations in this technique. First, not all malicious programs have bit patterns that are evidence of their malicious nature. Therefore, some malware is not recorded in the virus dictionary and can not be detected through signature matching. Second, searching for specific bit patterns will not work on malware that can take many forms--obfuscated malware. Signature matching has been shown to be incapable of identifying new malware patterns and fails to recognize obfuscated malware. This paper presents a malware detection technique that discovers malware by means of a learning engine trained on a set of malware instances and a set of benign code instances. The learning engine uses an adaptive data compression model--prediction by partial matching (PPM)--to build two compression models, one from the malware instances and the other from the benign code instances. A code instance is classified, either as "malware" or "benign", by minimizing its estimated cross entropy. Our preliminary results are very promising. We achieved about 0.94 true positive rate with as low as 0.016 false positive rate. Our experiments also demonstrate that this technique can effectively detect unknown and obfuscated malware.