Accurate, scalable in-network identification of p2p traffic using application signatures
Proceedings of the 13th international conference on World Wide Web
Internet traffic classification using bayesian analysis techniques
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Traffic classification on the fly
ACM SIGCOMM Computer Communication Review
ACM SIGCOMM Computer Communication Review
Acceleration of decision tree searching for IP traffic classification
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
A P2P Traffic Classification Method Based on SVM
ISCSCT '08 Proceedings of the 2008 International Symposium on Computer Science and Computational Technology - Volume 02
Toward the accurate identification of network applications
PAM'05 Proceedings of the 6th international conference on Passive and Active Network Measurement
A survey of techniques for internet traffic classification using machine learning
IEEE Communications Surveys & Tutorials
A parameterizable methodology for Internet traffic flow profiling
IEEE Journal on Selected Areas in Communications
Bayesian Neural Networks for Internet Traffic Classification
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Existing network traffic classifiers often assume the availability of ideal training dataset. Yet in practice, the training dataset may contain a substantial number of flows labeled as 'unknown', including both the flows from classes that are not modeled by the classifier, and the unrecognized flows from modeled classes. Such training dataset will seriously degrade the recall rate and generalization capability of existing classifiers treating unknowns just as a normal class. In this paper, we propose a semi-supervised multivariate decision tree classification algorithm, based on adaptive hierarchical clustering. Rather than using Gini index or information gain relying on perfect training dataset, we use adaptive hierarchical clustering, to construct the decision tree. The clustering process can identify unknown flows belonging modeled classes, avoiding the pitfalls of existing algorithms treating them equally as real unknowns. After mapping each leaf cluster to a class based on its majority members, and assigning decision rules based on cluster centers, we get a multivariate decision tree. The experiment result shows that our algorithm can significantly improve the recall rate of flows belonging to modeled classes compared to a decision tree classifier, with only small impact on precision.