Unsupervised traffic classification using flow statistical properties and IP packet payload

  • Authors:
  • Jun Zhang;Yang Xiang;Wanlei Zhou;Yu Wang

  • Affiliations:
  • School of Information Technology, Deakin University, Australia;School of Information Technology, Deakin University, Australia;School of Information Technology, Deakin University, Australia;School of Information Technology, Deakin University, Australia

  • Venue:
  • Journal of Computer and System Sciences
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In network traffic classification, ''unknown applications'' is a difficult problem unsolved. Conventional supervised classification methods classify any traffic flow into predefined classes, while cannot handle unknown applications without corresponding supervised data. Some unsupervised clustering algorithms, such as k-means, have been applied to group traffic flows automatically, but a large number of resulting clusters are unable to correctly represent a small number of real applications. To address the problem of unknown applications, we propose a novel unsupervised approach which has the capability to discover application-based traffic classes and classify traffic flows according to their generation applications. In the proposed approach, flow statistical properties and IP packet payload are used in combination to discover traffic classes in the training stage. We introduce a bag-of-words (BoW) model to represent the content of clusters constructed by using flow statistical features, and apply the latent semantic analysis (LSA) to aggregate similar traffic clusters based on their payload content. In the testing stage, only flow statistical features are used to classify traffic flows, that can protect user privacy and deal with known encrypted applications without inspecting IP packets. A number of experiments are carried out on a real-world traffic dataset to demonstrate the effectiveness and robustness of the proposed approach.