Identification of program similarity in large populations
The Computer Journal - Special issue on procedural programming
Detection of similarities in student programs: YAP'ing may be preferable to plague'ing
SIGCSE '92 Proceedings of the twenty-third SIGCSE technical symposium on Computer science education
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
IEEE Transactions on Software Engineering
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Toward Automated Dynamic Malware Analysis Using CWSandbox
IEEE Security and Privacy
Learning and Classification of Malware Behavior
DIMVA '08 Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
BitBlaze: A New Approach to Computer Security via Binary Analysis
ICISS '08 Proceedings of the 4th International Conference on Information Systems Security
Large-scale malware indexing using function-call graphs
Proceedings of the 16th ACM conference on Computer and communications security
Detecting Software Theft via System Call Based Birthmarks
ACSAC '09 Proceedings of the 2009 Annual Computer Security Applications Conference
Automated classification and analysis of internet malware
RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Behavioral clustering of HTTP-based malware and signature generation using malicious network traces
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Malware images: visualization and automatic classification
Proceedings of the 8th International Symposium on Visualization for Cyber Security
Experimental challenges in cyber security: a story of provenance and lineage for malware
CSET'11 Proceedings of the 4th conference on Cyber security experimentation and test
Proceedings of the 4th ACM workshop on Security and artificial intelligence
BitShred: feature hashing malware for scalable triage and semantic analysis
Proceedings of the 18th ACM conference on Computer and communications security
Tracking concept drift in malware families
Proceedings of the 5th ACM workshop on Security and artificial intelligence
Malware characterization using behavioral components
MMM-ACNS'12 Proceedings of the 6th international conference on Mathematical Methods, Models and Architectures for Computer Network Security: computer network security
VAMO: towards a fully automated malware clustering validity analysis
Proceedings of the 28th Annual Computer Security Applications Conference
Journal of Network and Computer Applications
Experiments with malware visualization
DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
SigMal: a static signal processing based malware triage
Proceedings of the 29th Annual Computer Security Applications Conference
Hi-index | 0.00 |
Malware clustering and classification are important tools that enable analysts to prioritize their malware analysis efforts. The recent emergence of fully automated methods for malware clustering and classification that report high accuracy suggests that this problem may largely be solved. In this paper, we report the results of our attempt to confirm our conjecture that the method of selecting ground-truth data in prior evaluations biases their results toward high accuracy. To examine this conjecture, we apply clustering algorithms from a different domain (plagiarism detection), first to the dataset used in a prior work's evaluation and then to a wholly new malware dataset, to see if clustering algorithms developed without attention to subtleties of malware obfuscation are nevertheless successful. While these studies provide conflicting signals as to the correctness of our conjecture, our investigation of possible reasons uncovers, we believe, a cautionary note regarding the significance of highly accurate clustering results, as can be impacted by testing on a dataset with a biased cluster-size distribution.