Malware detection using statistical analysis of byte-level file content

Authors:
S. Momina Tabish;M. Zubair Shafiq;Muddassar Farooq
Affiliations:
National University of Computer & Emerging Sciences (FAST-NUCES), Islamabad, Pakistan;National University of Computer & Emerging Sciences (FAST-NUCES), Islamabad, Pakistan;National University of Computer & Emerging Sciences (FAST-NUCES), Islamabad, Pakistan
Venue:
Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics
Year:
2009

Citing 13
Cited 10

Instance-Based Learning Algorithms

Machine Learning
Elements of information theory

Elements of information theory
C4.5: programs for machine learning

C4.5: programs for machine learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
On Relevance, Probabilistic Indexing and Information Retrieval

Journal of the ACM (JACM)
Data Mining Methods for Detection of New Malicious Executables

SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Learning to detect malicious executables in the wild

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A Feature Selection and Evaluation Scheme for Computer Virus Detection

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Mining specifications of malicious behavior

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Classification of packed executables for accurate computer virus detection

Pattern Recognition Letters
CloudAV: N-version antivirus in the network cloud

SS'08 Proceedings of the 17th conference on Security symposium
Biologically inspired defenses against computer viruses

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1

RTP-miner: a real-time security framework for RTP fuzzing attacks

Proceedings of the 20th international workshop on Network and operating systems support for digital audio and video
FORECAST: skimming off the malware cream

Proceedings of the 27th Annual Computer Security Applications Conference
A cuckoo's egg in the malware nest on-the-fly signature-less malware analysis, detection, and containment for large networks

LISA'11 Proceedings of the 25th international conference on Large Installation System Administration
The proactivity of Perceptron derived algorithms in malware detection

Journal in Computer Virology
Malicious PDF detection using metadata and structural features

Proceedings of the 28th Annual Computer Security Applications Conference
A static, packer-agnostic filter to detect similar malware samples

DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Looking at the bag is not enough to find the bomb: an evasion of structural methods for malicious PDF files detection

Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security
Detecting machine-morphed malware variants via engine attribution

Journal in Computer Virology
Malware analysis method using visualization of binary files

Proceedings of the 2013 Research in Adaptive and Convergent Systems
SigMal: a static signal processing based malware triage

Proceedings of the 29th Annual Computer Security Applications Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Commercial anti-virus software are unable to provide protection against newly launched (a.k.a "zero-day") malware. In this paper, we propose a novel malware detection technique which is based on the analysis of byte-level file content. The novelty of our approach, compared with existing content based mining schemes, is that it does not memorize specific byte-sequences or strings appearing in the actual file content. Our technique is non-signature based and therefore has the potential to detect previously unknown and zero-day malware. We compute a wide range of statistical and information-theoretic features in a block-wise manner to quantify the byte-level file content. We leverage standard data mining algorithms to classify the file content of every block as normal or potentially malicious. Finally, we correlate the block-wise classification results of a given file to categorize it as benign or malware. Since the proposed scheme operates at the byte-level file content; therefore, it does not require any a priori information about the filetype. We have tested our proposed technique using a benign dataset comprising of six different filetypes --- DOC, EXE, JPG, MP3, PDF and ZIP and a malware dataset comprising of six different malware types --- backdoor, trojan, virus, worm, constructor and miscellaneous. We also perform a comparison with existing data mining based malware detection techniques. The results of our experiments show that the proposed nonsignature based technique surpasses the existing techniques and achieves more than 90% detection accuracy.