A parameter-free hybrid clustering algorithm used for malware categorization

Authors:
ZhiXue Han;Shaorong Feng;Yanfang Ye;Qingshan Jiang
Affiliations:
Department of Computer Science, Xiamen University, Xiamen, China;Department of Computer Science, Xiamen University, Xiamen, China;Department of Computer Science, Xiamen University, Xiamen, China;Software School, Xiamen University, Xiamen, China
Venue:
ASID'09 Proceedings of the 3rd international conference on Anti-Counterfeiting, security, and identification in communication
Year:
2009

Citing 8
Cited 1

Attacking Malicious Code: A Report to the Infosec Research Council

IEEE Software
Semantics-Aware Malware Detection

SP '05 Proceedings of the 2005 IEEE Symposium on Security and Privacy
Novel Hybrid Hierarchical-K-means Clustering Method (H-K-means) for Microarray Analysis

CSBW '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference - Workshops
IMDS: intelligent malware detection system

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Protein Sequence Motif Super-Rule-Tree (SRT) Structure Constructed by Hybrid Hierarchical K-Means Clustering Algorithm

BIBM '08 Proceedings of the 2008 IEEE International Conference on Bioinformatics and Biomedicine
Automated classification and analysis of internet malware

RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Text categorization with class-based and corpus-based keyword selection

ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Stock price movement prediction using representative prototypes of financial reports

ACM Transactions on Management Information Systems (TMIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays, numerous attacks made by the malware, such as viruses, backdoors, spyware, trojans and worms, have presented a major security threat to computer users. The most significant line of defense against malware is anti-virus products which detects, removes, and characterizes these threats. The ability of these AV products to successfully characterize these threats greatly depends on the method for categorizing these profiles of malware into groups. Therefore, clustering malware into different families is one of the computer security topics that are of great interest. In this paper, resting on the analysis of the extracted instruction of malware samples, we propose a novel parameter-free hybrid clustering algorithm (PFHC) which combines the merits of hierarchical clustering and K-means algorithms for malware clustering. It can not only generate stable initial division, but also give the best K. PFHC first utilizes agglomerative hierarchical clustering algorithm as the frame, starting with N singleton clusters, each of which exactly includes one sample, then reuses the centroids of upper level in every level and merges the two nearest clusters, finally adopts K-means algorithm for iteration to achieve an approximate global optimal division. PFHC evaluates clustering validity of each iteration procedure and generates the best K by comparing the values. The promising studies on real daily data collection illustrate that, compared with popular existing K-means and hierarchical clustering approaches, our proposed PFHC algorithm always generates much higher quality clusters and it can be well used for malware categorization.