Clustering botnet communication traffic based on n-gram feature selection

  • Authors:
  • Wei Lu;Goaletsa Rammidi;Ali A. Ghorbani

  • Affiliations:
  • Faculty of Computer Science, University of New Brunswick, 540 Windsor Street, Gillin Hall, E126, Fredericton, Canada NB E3B 5A3;Faculty of Computer Science, University of New Brunswick, 540 Windsor Street, Gillin Hall, E126, Fredericton, Canada NB E3B 5A3;Faculty of Computer Science, University of New Brunswick, 540 Windsor Street, Gillin Hall, E126, Fredericton, Canada NB E3B 5A3

  • Venue:
  • Computer Communications
  • Year:
  • 2011

Quantified Score

Hi-index 0.24

Visualization

Abstract

Recognized as one the most serious security threats on current Internet infrastructure, botnets can not only be implemented by existing well known applications, e.g. IRC, HTTP, or Peer-to-Peer, but also can be constructed by unknown or creative applications, which makes the botnet detection a challenging problem. Previous attempts for detecting botnets are mostly to examine traffic content for bot command on selected network links or by setting up honeypots. Traffic content, however, can be encrypted with the evolution of botnet, and as a result leading to a fail of content based detection approaches. In this paper, we address this issue and propose a new approach for detecting and clustering botnet traffic on large-scale network application communities, in which we first classify the network traffic into different applications by using traffic payload signatures, and then a novel decision tree model is used to classify those traffic to be unknown by the payload content (e.g. encrypted traffic) into known application communities where network traffic is clustered based on n-gram features selected and extracted from the content of network flows in order to differentiate the malicious botnet traffic created by bots from normal traffic generated by human beings on each specific application. We evaluate our approach with seven different traffic trace collected on three different network links and results show the proposed approach successfully detects two IRC botnet traffic traces with a high detection rate and an acceptable low false alarm rate.