Internet traffic classification demystified: on the sources of the discriminative power

Authors:
Yeon-sup Lim;Hyun-chul Kim;Jiwoong Jeong;Chong-kwon Kim;Ted "Taekyoung" Kwon;Yanghee Choi
Affiliations:
University of Massachusetts, Amherst, MA;Seoul National University, Seoul, Korea;Seoul National University, Seoul, Korea;Seoul National University, Seoul, Korea;Seoul National University, Seoul, Korea;Seoul National University, Seoul, Korea
Venue:
Proceedings of the 6th International COnference
Year:
2010

Citing 33
Cited 8

On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
Induction of Decision Trees

Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Why Discretization Works for Naive Bayesian Classifiers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Accurate, scalable in-network identification of p2p traffic using application signatures

Proceedings of the 13th international conference on World Wide Web
Transport layer identification of P2P traffic

Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification

Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Internet traffic classification using bayesian analysis techniques

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
BLINC: multilevel traffic classification in the dark

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
ACAS: automated construction of application signatures

Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
Automated Traffic Classification and Application Identification using Machine Learning

LCN '05 Proceedings of the The IEEE Conference on Local Computer Networks 30th Anniversary
Traffic classification using clustering algorithms

Proceedings of the 2006 SIGCOMM workshop on Mining network data
A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification

ACM SIGCOMM Computer Communication Review
Traffic classification through simple statistical fingerprinting

ACM SIGCOMM Computer Communication Review
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Identifying and discriminating between web and peer-to-peer traffic in the network core

Proceedings of the 16th international conference on World Wide Web
Dynamic application-layer protocol analysis for network intrusion detection

USENIX-SS'06 Proceedings of the 15th conference on USENIX Security Symposium - Volume 15
Byte me: a case for byte accuracy in traffic classification

Proceedings of the 3rd annual ACM workshop on Mining network data
Comparing traffic classifiers

ACM SIGCOMM Computer Communication Review
Offline/realtime traffic classification using semi-supervised learning

Performance Evaluation
Early application identification

CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Internet traffic classification demystified: myths, caveats, and the best practices

CoNEXT '08 Proceedings of the 2008 ACM CoNEXT Conference
Unveiling core network-wide communication patterns through application traffic activity graph decomposition

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
On the stability of the information carried by traffic flow features at the packet level

ACM SIGCOMM Computer Communication Review
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research
Effects of discretization on determination of coronary artery disease using support vector machine

Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Graph-based P2P traffic classification at the internet backbone

INFOCOM'09 Proceedings of the 28th IEEE international conference on Computer Communications Workshops
Toward the accurate identification of network applications

PAM'05 Proceedings of the 6th international conference on Passive and Active Network Measurement
A parameterizable methodology for Internet traffic flow profiling

IEEE Journal on Selected Areas in Communications
Bayesian Neural Networks for Internet Traffic Classification

IEEE Transactions on Neural Networks

NeTraMark: a network traffic classification benchmark

ACM SIGCOMM Computer Communication Review
Padding and fragmentation for masking packet length statistics

TMA'12 Proceedings of the 4th international conference on Traffic Monitoring and Analysis
Feature selection for optimizing traffic classification

Computer Communications
Deep packet inspection tools and techniques in commodity platforms: Challenges and trends

Journal of Network and Computer Applications
Wire-speed statistical classification of network traffic on commodity hardware

Proceedings of the 2012 ACM conference on Internet measurement conference
High throughput and programmable online trafficclassifier on FPGA

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Synoptic graphlet: bridging the gap between supervised and unsupervised profiling of host-level network traffic

IEEE/ACM Transactions on Networking (TON)
Reviewing traffic classification

DataTraffic Monitoring and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent research on Internet traffic classification has yield a number of data mining techniques for distinguishing types of traffic, but no systematic analysis on "Why" some algorithms achieve high accuracies. In pursuit of empirically grounded answers to the "Why" question, which is critical in understanding and establishing a scientific ground for traffic classification research, this paper reveals the three sources of the discriminative power in classifying the Internet application traffic: (i) ports, (ii) the sizes of the first one-two (for UDP flows) or four-five (for TCP flows) packets, and (iii) discretization of those features. We find that C4.5 performs the best under any circumstances, as well as the reason why; because the algorithm discretizes input features during classification operations. We also find that the entropy-based Minimum Description Length discretization on ports and packet size features substantially improve the classification accuracy of every machine learning algorithm tested (by as much as 59.8%!) and make all of them achieve 93% accuracy on average without any algorithm-specific tuning processes. Our results indicate that dealing with the ports and packet size features as discrete nominal intervals, not as continuous numbers, is the essential basis for accurate traffic classification (i.e., the features should be discretized first), regardless of classification algorithms to use.