C4.5: programs for machine learning
C4.5: programs for machine learning
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Composite Kernels for Hypertext Categorisation
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Efficient handling of high-dimensional feature spaces by randomized classifier ensembles
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining Methods for Detection of New Malicious Executables
SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Classifying large data sets using SVMs with hierarchical clusters
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to detect malicious executables in the wild
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Static Analyzer of Vicious Executables (SAVE)
ACSAC '04 Proceedings of the 20th Annual Computer Security Applications Conference
Core Vector Machines: Fast SVM Training on Very Large Data Sets
The Journal of Machine Learning Research
Algorithmic detection of semantic similarity
WWW '05 Proceedings of the 14th international conference on World Wide Web
Computer Viruses: from theory to applications (Collection IRIS)
Computer Viruses: from theory to applications (Collection IRIS)
Combining content and link for classification using matrix factorization
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
IMDS: intelligent malware detection system
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining specifications of malicious behavior
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Learning multiple graphs for document recommendations
Proceedings of the 17th international conference on World Wide Web
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
vEye: behavioral footprinting for self-propagating worm detection and profiling
Knowledge and Information Systems
When are links useful? experiments in text classification
ECIR'03 Proceedings of the 25th European conference on IR research
Automated classification and analysis of internet malware
RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Isolating and analyzing fraud activities in a large cellular network via voice call graph analysis
Proceedings of the 10th international conference on Mobile systems, applications, and services
Improving malware classification: bridging the static/dynamic gap
Proceedings of the 5th ACM workshop on Security and artificial intelligence
Using file relationships in malware classification
DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Graph-based malware distributors detection
Proceedings of the 22nd international conference on World Wide Web companion
Hi-index | 0.00 |
Due to their damages to Internet security, malware (such as virus, worms, trojans, spyware, backdoors, and rootkits) detection has caught the attention not only of anti-malware industry but also of researchers for decades. Resting on the analysis of file contents extracted from the file samples, like Application Programming Interface (API) calls, instruction sequences, and binary strings, data mining methods such as Naive Bayes and Support Vector Machines have been used for malware detection. However, besides file contents, relations among file samples, such as a "Downloader" is always associated with many Trojans, can provide invaluable information about the properties of file samples. In this paper, we study how file relations can be used to improve malware detection results and develop a file verdict system (named "Valkyrie") building on a semi-parametric classifier model to combine file content and file relations together for malware detection. To the best of our knowledge, this is the first work of using both file content and file relations for malware detection. A comprehensive experimental study on a large collection of PE files obtained from the clients of anti-malware products of Comodo Security Solutions Incorporation is performed to compare various malware detection approaches. Promising experimental results demonstrate that the accuracy and efficiency of our Valkyrie system outperform other popular anti-malware software tools such as Kaspersky AntiVirus and McAfee VirusScan, as well as other alternative data mining based detection systems.