Decision tree network traffic classifier via adaptive hierarchical clustering for imperfect training dataset

Authors:
Ping Lin;Zhenming Lei;Luying Chen;Jie Yang;Fang Liu
Affiliations:
Key Laboratory of Information Processing and Intelligent Technology, Beijing University of Posts and Telecommunications, Beijing, China;Key Laboratory of Information Processing and Intelligent Technology, Beijing University of Posts and Telecommunications, Beijing, China;Key Laboratory of Information Processing and Intelligent Technology, Beijing University of Posts and Telecommunications, Beijing, China;Key Laboratory of Information Processing and Intelligent Technology, Beijing University of Posts and Telecommunications, Beijing, China;Key Laboratory of Information Processing and Intelligent Technology, Beijing University of Posts and Telecommunications, Beijing, China
Venue:
WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
Year:
2009

Citing 10
Cited 0

Accurate, scalable in-network identification of p2p traffic using application signatures

Proceedings of the 13th international conference on World Wide Web
Internet traffic classification using bayesian analysis techniques

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Traffic classification on the fly

ACM SIGCOMM Computer Communication Review
A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification

ACM SIGCOMM Computer Communication Review
Acceleration of decision tree searching for IP traffic classification

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
A P2P Traffic Classification Method Based on SVM

ISCSCT '08 Proceedings of the 2008 International Symposium on Computer Science and Computational Technology - Volume 02
Toward the accurate identification of network applications

PAM'05 Proceedings of the 6th international conference on Passive and Active Network Measurement
A survey of techniques for internet traffic classification using machine learning

IEEE Communications Surveys & Tutorials
A parameterizable methodology for Internet traffic flow profiling

IEEE Journal on Selected Areas in Communications
Bayesian Neural Networks for Internet Traffic Classification

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing network traffic classifiers often assume the availability of ideal training dataset. Yet in practice, the training dataset may contain a substantial number of flows labeled as 'unknown', including both the flows from classes that are not modeled by the classifier, and the unrecognized flows from modeled classes. Such training dataset will seriously degrade the recall rate and generalization capability of existing classifiers treating unknowns just as a normal class. In this paper, we propose a semi-supervised multivariate decision tree classification algorithm, based on adaptive hierarchical clustering. Rather than using Gini index or information gain relying on perfect training dataset, we use adaptive hierarchical clustering, to construct the decision tree. The clustering process can identify unknown flows belonging modeled classes, avoiding the pitfalls of existing algorithms treating them equally as real unknowns. After mapping each leaf cluster to a class based on its majority members, and assigning decision rules based on cluster centers, we get a multivariate decision tree. The experiment result shows that our algorithm can significantly improve the recall rate of flows belonging to modeled classes compared to a decision tree classifier, with only small impact on precision.