Revealing the Unknown ADSL Traffic Using Statistical Methods

Authors:
Marcin Pietrzyk;Guillaume Urvoy-Keller;Jean-Laurent Costeux
Affiliations:
Orange Labs, France;Institute Eurecom, France;Orange Labs, France
Venue:
TMA '09 Proceedings of the First International Workshop on Traffic Monitoring and Analysis
Year:
2009

Citing 8
Cited 1

Empirically derived analytic models of wide-area TCP connections

IEEE/ACM Transactions on Networking (TON)
Operational experiences with high-volume network intrusion detection

Proceedings of the 11th ACM conference on Computer and communications security
Traffic classification using clustering algorithms

Proceedings of the 2006 SIGCOMM workshop on Mining network data
Early application identification

CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Unconstrained endpoint profiling (googling the internet)

Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Internet traffic classification demystified: myths, caveats, and the best practices

CoNEXT '08 Proceedings of the 2008 ACM CoNEXT Conference
On the validation of traffic classification algorithms

PAM'08 Proceedings of the 9th international conference on Passive and active network measurement
A survey of techniques for internet traffic classification using machine learning

IEEE Communications Surveys & Tutorials

On profiling residential customers

TMA'11 Proceedings of the Third international conference on Traffic monitoring and analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traffic classification is one of the most significant issues for ISPs and network administrators. Recent research on the subject resulted in a large variety of algorithms and methods applicable to the problem. In this work, we focus on several issues that have not received enough attention so far. First, the establishment of an accurate reference point. We use an ISP internal Deep Packet Inspection (DPI) tool and confront its results with state of the art, freely available classification tools, finding significant differences. We relate those differences to the weakness of some signatures and to the heuristics and design choices made by DPI tools. Second, we highlight methodological issues behind the choices of the traffic classes and the way of analyzing the results of a statistical classifier. Last, we focus on the often overlooked problem of mining the unknown traffic, i.e., traffic not classified by the DPI tool used to establish the reference point. We present a method, relying on the level of confidence of the statistical classification, to reveal the unknown traffic. We further discuss the result of the classifier using a variety of heuristics.