Finding Consistent Clusters in Data Partitions
MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
Data Clustering Using Evidence Accumulation
ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Deobfuscation: Reverse Engineering Obfuscated Code
WCRE '05 Proceedings of the 12th Working Conference on Reverse Engineering
ACM Transactions on Knowledge Discovery from Data (TKDD)
Learning to Detect and Classify Malicious Executables in the Wild
The Journal of Machine Learning Research
Static analysis of executables to detect malicious patterns
SSYM'03 Proceedings of the 12th conference on USENIX Security Symposium - Volume 12
Static disassembly of obfuscated binaries
SSYM'04 Proceedings of the 13th conference on USENIX Security Symposium - Volume 13
Learning and Classification of Malware Behavior
DIMVA '08 Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
A Study of the Packer Problem and Its Solutions
RAID '08 Proceedings of the 11th international symposium on Recent Advances in Intrusion Detection
Ether: malware analysis via hardware virtualization extensions
Proceedings of the 15th ACM conference on Computer and communications security
Automated classification and analysis of internet malware
RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Automatic malware categorization using cluster ensemble
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
peHash: a novel approach to fast malware clustering
LEET'09 Proceedings of the 2nd USENIX conference on Large-scale exploits and emergent threats: botnets, spyware, worms, and more
A view on current malware behaviors
LEET'09 Proceedings of the 2nd USENIX conference on Large-scale exploits and emergent threats: botnets, spyware, worms, and more
Dynamic and transparent analysis of commodity production systems
Proceedings of the IEEE/ACM international conference on Automated software engineering
The power of procrastination: detection and mitigation of execution-stalling malicious code
Proceedings of the 18th ACM conference on Computer and communications security
BitShred: feature hashing malware for scalable triage and semantic analysis
Proceedings of the 18th ACM conference on Computer and communications security
Improving malware classification: bridging the static/dynamic gap
Proceedings of the 5th ACM workshop on Security and artificial intelligence
Hi-index | 0.00 |
Automatic malware clustering plays a vital role in combating the rapidly growing number of malware variants. Most existing malware clustering algorithms operate on either static instruction features or dynamic behavior features to partition malware into families. However, these two distinct approaches have their own strengths and weaknesses in handling different types of malware. Moreover, different clustering algorithms and even multiple runs of the same algorithms may produce inconsistent or even contradictory results. To remedy this heterogeneity and lack of robustness of a single clustering algorithm, we propose a novel system called DUET by exploiting the complementary nature of static and dynamic clustering algorithms and optimally integrating their results. By using the concept of clustering ensemble, DUET combines partitions from individual clustering algorithms into a single consensus partition with better quality and robustness. DUET improves existing ensemble algorithms by incorporating cluster-quality measures to effectively reconcile differences and/or contradictions between base malware clusterings. Using real-world malware samples, we compare the performance of DUET (in terms of clustering precision, recall and coverage) with individual state-of-the-art static and dynamic clustering component. The comprehensive experiments demonstrate DUET's capability of improving the coverage of malware samples by 20--40% while keeping the precision near the optimum achievable by any individual clustering algorithm.