C4.5: programs for machine learning
C4.5: programs for machine learning
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Discretization: An Enabling Technique
Data Mining and Knowledge Discovery
Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Why Discretization Works for Naive Bayesian Classifiers
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Accurate, scalable in-network identification of p2p traffic using application signatures
Proceedings of the 13th international conference on World Wide Web
Transport layer identification of P2P traffic
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Internet traffic classification using bayesian analysis techniques
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
BLINC: multilevel traffic classification in the dark
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
ACAS: automated construction of application signatures
Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
Automated Traffic Classification and Application Identification using Machine Learning
LCN '05 Proceedings of the The IEEE Conference on Local Computer Networks 30th Anniversary
Traffic classification using clustering algorithms
Proceedings of the 2006 SIGCOMM workshop on Mining network data
ACM SIGCOMM Computer Communication Review
Traffic classification through simple statistical fingerprinting
ACM SIGCOMM Computer Communication Review
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Identifying and discriminating between web and peer-to-peer traffic in the network core
Proceedings of the 16th international conference on World Wide Web
Dynamic application-layer protocol analysis for network intrusion detection
USENIX-SS'06 Proceedings of the 15th conference on USENIX Security Symposium - Volume 15
Byte me: a case for byte accuracy in traffic classification
Proceedings of the 3rd annual ACM workshop on Mining network data
ACM SIGCOMM Computer Communication Review
Offline/realtime traffic classification using semi-supervised learning
Performance Evaluation
Early application identification
CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Internet traffic classification demystified: myths, caveats, and the best practices
CoNEXT '08 Proceedings of the 2008 ACM CoNEXT Conference
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
On the stability of the information carried by traffic flow features at the packet level
ACM SIGCOMM Computer Communication Review
Improved use of continuous attributes in C4.5
Journal of Artificial Intelligence Research
Effects of discretization on determination of coronary artery disease using support vector machine
Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Graph-based P2P traffic classification at the internet backbone
INFOCOM'09 Proceedings of the 28th IEEE international conference on Computer Communications Workshops
Toward the accurate identification of network applications
PAM'05 Proceedings of the 6th international conference on Passive and Active Network Measurement
A parameterizable methodology for Internet traffic flow profiling
IEEE Journal on Selected Areas in Communications
Bayesian Neural Networks for Internet Traffic Classification
IEEE Transactions on Neural Networks
NeTraMark: a network traffic classification benchmark
ACM SIGCOMM Computer Communication Review
Padding and fragmentation for masking packet length statistics
TMA'12 Proceedings of the 4th international conference on Traffic Monitoring and Analysis
Feature selection for optimizing traffic classification
Computer Communications
Deep packet inspection tools and techniques in commodity platforms: Challenges and trends
Journal of Network and Computer Applications
Wire-speed statistical classification of network traffic on commodity hardware
Proceedings of the 2012 ACM conference on Internet measurement conference
High throughput and programmable online trafficclassifier on FPGA
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
IEEE/ACM Transactions on Networking (TON)
Reviewing traffic classification
DataTraffic Monitoring and Analysis
Hi-index | 0.00 |
Recent research on Internet traffic classification has yield a number of data mining techniques for distinguishing types of traffic, but no systematic analysis on "Why" some algorithms achieve high accuracies. In pursuit of empirically grounded answers to the "Why" question, which is critical in understanding and establishing a scientific ground for traffic classification research, this paper reveals the three sources of the discriminative power in classifying the Internet application traffic: (i) ports, (ii) the sizes of the first one-two (for UDP flows) or four-five (for TCP flows) packets, and (iii) discretization of those features. We find that C4.5 performs the best under any circumstances, as well as the reason why; because the algorithm discretizes input features during classification operations. We also find that the entropy-based Minimum Description Length discretization on ports and packet size features substantially improve the classification accuracy of every machine learning algorithm tested (by as much as 59.8%!) and make all of them achieve 93% accuracy on average without any algorithm-specific tuning processes. Our results indicate that dealing with the ports and packet size features as discrete nominal intervals, not as continuous numbers, is the essential basis for accurate traffic classification (i.e., the features should be discretized first), regardless of classification algorithms to use.