Scalable fine-grained behavioral clustering of HTTP-based malware

  • Authors:
  • Roberto Perdisci;Davide Ariu;Giorgio Giacinto

  • Affiliations:
  • Department of Computer Science, University of Georgia, 415 Boyd Graduate Studies Research Center, Athens, GA 30602-7404, United States;Department of Electric and Electronic Engineering, University of Cagliari, Piazza d'Armi, 09123 Cagliari, Italy;Department of Electric and Electronic Engineering, University of Cagliari, Piazza d'Armi, 09123 Cagliari, Italy

  • Venue:
  • Computer Networks: The International Journal of Computer and Telecommunications Networking
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

A large number of today's botnets leverage the HTTP protocol to communicate with their botmasters or perpetrate malicious activities. In this paper, we present a new scalable system for network-level behavioral clustering of HTTP-based malware that aims to efficiently group newly collected malware samples into malware family clusters. The end goal is to obtain malware clusters that can aid the automatic generation of high quality network signatures, which can in turn be used to detect botnet command-and-control (C&C) and other malware-generated communications at the network perimeter. We achieve scalability in our clustering system by simplifying the multi-step clustering process proposed in [31], and by leveraging incremental clustering algorithms that run efficiently on very large datasets. At the same time, we show that scalability is achieved while retaining a good trade-off between detection rate and false positives for the signatures derived from the obtained malware clusters. We implemented a proof-of-concept version of our new scalable malware clustering system and performed experiments with about 65,000 distinct malware samples. Results from our evaluation confirm the effectiveness of the proposed system and show that, compared to [31], our approach can reduce processing times from several hours to a few minutes, and scales well to large datasets containing tens of thousands of distinct malware samples.