Host-Based P2P Flow Identification and Use in Real-Time

  • Authors:
  • John Hurley;Emi Garcia-Palacios;Sakir Sezer

  • Affiliations:
  • Queen’s University of Belfast;Queen’s University of Belfast;Queen’s University of Belfast

  • Venue:
  • ACM Transactions on the Web (TWEB)
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data identification and classification is a key task for any Internet Service Provider (ISP) or network administrator. As port fluctuation and encryption become more common in P2P applications wishing to avoid identification, new strategies must be developed to detect and classify their flows. This article introduces a method of separating P2P and standard web traffic that can be applied as part of an offline data analysis process, based on the activity of the hosts on the network. Heuristics are analyzed and a classification system proposed that focuses on classifying those “long” flows that transfer most of the bytes across a network. The accuracy of the system is then tested using real network traffic from a core Internet router showing misclassification rates as low as 0.54% of flows in some cases. We expand on this proposed strategy to investigate its relevance to real-time, early classification problems. New proposals are made and the results of real-time experiments are compared to those obtained in the offline analysis. It is shown that classification accuracies in the real-time strategy are similar to those achieved in offline analysis with a large portion of the total web and P2P flows correctly identified.