Combining file content and file relations for cloud based malware detection

Authors:
Yanfang Ye;Tao Li;Shenghuo Zhu;Weiwei Zhuang;Egemen Tas;Umesh Gupta;Melih Abdulhayoglu
Affiliations:
Comodo Security Solutions, Inc,, Beijing, China;Florida International University, Miami, FL, USA;NEC Laboratories America, Cupertino, CA, USA;Xiamen University, Xiamen, China;Comodo Security Solutions, Inc, New Jersey, NJ, USA;Comodo Security Solutions, Inc, New Jersey, NJ, USA;Comodo Security Solutions, Inc, New Jersey, NJ, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 21
Cited 4

C4.5: programs for machine learning

C4.5: programs for machine learning
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A practical hypertext catergorization method using links and incrementally available class information

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A Study of Approaches to Hypertext Categorization

Journal of Intelligent Information Systems
Composite Kernels for Hypertext Categorisation

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Efficient handling of high-dimensional feature spaces by randomized classifier ensembles

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining Methods for Detection of New Malicious Executables

SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Classifying large data sets using SVMs with hierarchical clusters

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to detect malicious executables in the wild

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Static Analyzer of Vicious Executables (SAVE)

ACSAC '04 Proceedings of the 20th Annual Computer Security Applications Conference
Core Vector Machines: Fast SVM Training on Very Large Data Sets

The Journal of Machine Learning Research
Algorithmic detection of semantic similarity

WWW '05 Proceedings of the 14th international conference on World Wide Web
Computer Viruses: from theory to applications (Collection IRIS)

Computer Viruses: from theory to applications (Collection IRIS)
Combining content and link for classification using matrix factorization

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
IMDS: intelligent malware detection system

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining specifications of malicious behavior

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Learning multiple graphs for document recommendations

Proceedings of the 17th international conference on World Wide Web
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
vEye: behavioral footprinting for self-propagating worm detection and profiling

Knowledge and Information Systems
When are links useful? experiments in text classification

ECIR'03 Proceedings of the 25th European conference on IR research
Automated classification and analysis of internet malware

RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection

Isolating and analyzing fraud activities in a large cellular network via voice call graph analysis

Proceedings of the 10th international conference on Mobile systems, applications, and services
Improving malware classification: bridging the static/dynamic gap

Proceedings of the 5th ACM workshop on Security and artificial intelligence
Using file relationships in malware classification

DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Graph-based malware distributors detection

Proceedings of the 22nd international conference on World Wide Web companion

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to their damages to Internet security, malware (such as virus, worms, trojans, spyware, backdoors, and rootkits) detection has caught the attention not only of anti-malware industry but also of researchers for decades. Resting on the analysis of file contents extracted from the file samples, like Application Programming Interface (API) calls, instruction sequences, and binary strings, data mining methods such as Naive Bayes and Support Vector Machines have been used for malware detection. However, besides file contents, relations among file samples, such as a "Downloader" is always associated with many Trojans, can provide invaluable information about the properties of file samples. In this paper, we study how file relations can be used to improve malware detection results and develop a file verdict system (named "Valkyrie") building on a semi-parametric classifier model to combine file content and file relations together for malware detection. To the best of our knowledge, this is the first work of using both file content and file relations for malware detection. A comprehensive experimental study on a large collection of PE files obtained from the clients of anti-malware products of Comodo Security Solutions Incorporation is performed to compare various malware detection approaches. Promising experimental results demonstrate that the accuracy and efficiency of our Valkyrie system outperform other popular anti-malware software tools such as Kaspersky AntiVirus and McAfee VirusScan, as well as other alternative data mining based detection systems.