Evaluation of malware clustering based on its dynamic behaviour

Authors:
Ibai Gurrutxaga;Olatz Arbelaitz;Jesús Ma Pérez;Javier Muguerza;José I. Martín;Iñigo Perona
Affiliations:
University of the Basque Country, Donostia, Spain;University of the Basque Country, Donostia, Spain;University of the Basque Country, Donostia, Spain;University of the Basque Country, Donostia, Spain;University of the Basque Country, Donostia, Spain;University of the Basque Country, Donostia, Spain
Venue:
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
Year:
2008

Citing 6
Cited 3

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science)

Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science)
Automated classification and analysis of internet malware

RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Behavioral distance for intrusion detection

RAID'05 Proceedings of the 8th international conference on Recent Advances in Intrusion Detection
The similarity metric

IEEE Transactions on Information Theory

Automatic malware categorization using cluster ensemble

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A comparative study of malware family classification

ICICS'12 Proceedings of the 14th international conference on Information and Communications Security
Review: Classification of malware based on integrated static and dynamic features

Journal of Network and Computer Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Malware detection is an important problem today. New malware appears every day and in order to be able to detect it, it is important to recognize families of existing malware. Data mining techniques will be very helpful in this context; concretely unsupervised learning methods will be adequate. This work presents a comparison of the behaviour of two representations for malware executables, a set of twelve distances for comparing them, and three variants of the hierarchical agglomerative clustering algorithm when used to capture the structure of different malware families and subfamilies. We propose a way the comparison can be done in an unsupervised learning environment. There are different conclusions we can draw from the whole work. Concerning to algorithms, the best option is average-linkage; this option seems to capture better the structure represented by the distance. The evaluation of the distances is more complex but some of them can be discarded because they behave clearly worse than the rest of the distances, and the group of distances behaving the best can be identified; the computational cost analysis can help when selecting the most convenient one.