MINETRAC: mining flows for unsupervised analysis & semi-supervised classification

  • Authors:
  • Pedro Casas;Johan Mazel;Philippe Owezarski

  • Affiliations:
  • CNRS/ LAAS/ Toulouse Cedex, France, and Universite de Toulouse/ UPS, INSA, INP, ISAE/ UT, UTM, LAAS/ Toulouse Cedex, France;CNRS/ LAAS/ Toulouse Cedex, France, and Universite de Toulouse/ UPS, INSA, INP, ISAE/ UT, UTM, LAAS/ Toulouse Cedex, France;CNRS/ LAAS/ Toulouse Cedex, France, and Universite de Toulouse/ UPS, INSA, INP, ISAE/ UT, UTM, LAAS/ Toulouse Cedex, France

  • Venue:
  • Proceedings of the 23rd International Teletraffic Congress
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Driven by the well-known limitations of port-based and payload-based analysis techniques, the use of Machine Learning for Internet traffic analysis and classification has become a fertile research area during the past half-decade. In this paper we introduce MINETRAC, a combination of unsupervised and semi-supervised machine learning techniques capable of identifying and classifying different classes of IP flows sharing similar characteristics. The unsupervised analysis is accomplished by means of robust clustering techniques, using Sub-Space Clustering, Evidence Accumulation, and Hierarchical Clustering algorithms to explore inter-flows structure. MINETRAC permits to identify natural groupings of traffic flows, combining the evidence of data structure provided by different partitions of the same set of traffic flows. Automatic classification is performed by means of semi-supervised learning, using only a small fraction of ground-truth flows to map the identified clusters into their associated most-probable originating network service or application. We evaluate the performance of MINETRAC using real traffic traces, additionally comparing its performance against previously proposed clustering-based flow analysis methods and supervised/semi-supervised classification approaches.