Scalable fine-grained behavioral clustering of HTTP-based malware

Authors:
Roberto Perdisci;Davide Ariu;Giorgio Giacinto
Affiliations:
Department of Computer Science, University of Georgia, 415 Boyd Graduate Studies Research Center, Athens, GA 30602-7404, United States;Department of Electric and Electronic Engineering, University of Cagliari, Piazza d'Armi, 09123 Cagliari, Italy;Department of Electric and Electronic Engineering, University of Cagliari, Piazza d'Armi, 09123 Cagliari, Italy
Venue:
Computer Networks: The International Journal of Computer and Telecommunications Networking
Year:
2013

Citing 27
Cited 3

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Service specific anomaly detection for network intrusion detection

Proceedings of the 2002 ACM symposium on Applied computing
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Polygraph: Automatically Generating Signatures for Polymorphic Worms

SP '05 Proceedings of the 2005 IEEE Symposium on Security and Privacy
MisleadingWorm Signature Generators Using Deliberate Noise Injection

SP '06 Proceedings of the 2006 IEEE Symposium on Security and Privacy
Exploring Multiple Execution Paths for Malware Analysis

SP '07 Proceedings of the 2007 IEEE Symposium on Security and Privacy
Panorama: capturing system-wide information flow for malware detection and analysis

Proceedings of the 14th ACM conference on Computer and communications security
Mining specifications of malicious behavior

ISEC '08 Proceedings of the 1st India software engineering conference
Hash-AV: fast virus signature scanning by cache-resident filters

International Journal of Security and Networks
Learning and Classification of Malware Behavior

DIMVA '08 Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
A Study of the Packer Problem and Its Solutions

RAID '08 Proceedings of the 11th international symposium on Recent Advances in Intrusion Detection
CloudAV: N-version antivirus in the network cloud

SS'08 Proceedings of the 17th conference on Security symposium
McPAD: A multiple classifier system for accurate payload-based anomaly detection

Computer Networks: The International Journal of Computer and Telecommunications Networking
Studying spamming botnets using Botlab

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Automated classification and analysis of internet malware

RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Behavioral clustering of HTTP-based malware and signature generation using malicious network traces

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Effective and efficient malware detection at the end host

SSYM'09 Proceedings of the 18th conference on USENIX security symposium
Validity index for clusters of different sizes and densities

Pattern Recognition Letters
JACKSTRAWS: picking command and control connections from bot traffic

SEC'11 Proceedings of the 20th USENIX conference on Security
The power of procrastination: detection and mitigation of execution-stalling malicious code

Proceedings of the 18th ACM conference on Computer and communications security
BitShred: feature hashing malware for scalable triage and semantic analysis

Proceedings of the 18th ACM conference on Computer and communications security
Paragraph: thwarting signature learning by training maliciously

RAID'06 Proceedings of the 9th international conference on Recent Advances in Intrusion Detection
Anagram: a content anomaly detector resistant to mimicry attack

RAID'06 Proceedings of the 9th international conference on Recent Advances in Intrusion Detection
Some new indexes of cluster validity

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Editorial: Editorial for Computer Networks special issue on ''Botnet Activity: Analysis, Detection and Shutdown''

Computer Networks: The International Journal of Computer and Telecommunications Networking
Is data clustering in adversarial settings secure?

Proceedings of the 2013 ACM workshop on Artificial intelligence and security
ExecScent: mining for new C&C domains in live networks with adaptive control protocol templates

SEC'13 Proceedings of the 22nd USENIX conference on Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

A large number of today's botnets leverage the HTTP protocol to communicate with their botmasters or perpetrate malicious activities. In this paper, we present a new scalable system for network-level behavioral clustering of HTTP-based malware that aims to efficiently group newly collected malware samples into malware family clusters. The end goal is to obtain malware clusters that can aid the automatic generation of high quality network signatures, which can in turn be used to detect botnet command-and-control (C&C) and other malware-generated communications at the network perimeter. We achieve scalability in our clustering system by simplifying the multi-step clustering process proposed in [31], and by leveraging incremental clustering algorithms that run efficiently on very large datasets. At the same time, we show that scalability is achieved while retaining a good trade-off between detection rate and false positives for the signatures derived from the obtained malware clusters. We implemented a proof-of-concept version of our new scalable malware clustering system and performed experiments with about 65,000 distinct malware samples. Results from our evaluation confirm the effectiveness of the proposed system and show that, compared to [31], our approach can reduce processing times from several hours to a few minutes, and scales well to large datasets containing tens of thousands of distinct malware samples.