A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification

  • Authors:
  • Nigel Williams;Sebastian Zander;Grenville Armitage

  • Affiliations:
  • Swinburne University of Technology, Melbourne, Australia;Swinburne University of Technology, Melbourne, Australia;Swinburne University of Technology, Melbourne, Australia

  • Venue:
  • ACM SIGCOMM Computer Communication Review
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The identification of network applications through observation of associated packet traffic flows is vital to the areas of network management and surveillance. Currently popular methods such as port number and payload-based identification exhibit a number of shortfalls. An alternative is to use machine learning (ML) techniques and identify network applications based on per-flow statistics, derived from payload-independent features such as packet length and inter-arrival time distributions. The performance impact of feature set reduction, using Consistency-based and Correlation-based feature selection, is demonstrated on Naïve Bayes, C4.5, Bayesian Network and Naïve Bayes Tree algorithms. We then show that it is useful to differentiate algorithms based on computational performance rather than classification accuracy alone, as although classification accuracy between the algorithms is similar, computational performance can differ significantly.