Machine Learning
Feature Selection Algorithms: A Survey and Experimental Evaluation
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Accurate, scalable in-network identification of p2p traffic using application signatures
Proceedings of the 13th international conference on World Wide Web
Feature selection for text categorization on imbalanced data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Internet traffic classification using bayesian analysis techniques
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
ACAS: automated construction of application signatures
Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
ACM SIGCOMM Computer Communication Review
Byte me: a case for byte accuracy in traffic classification
Proceedings of the 3rd annual ACM workshop on Mining network data
Predictive connectionist approach for VoD bandwidth management
Computer Communications
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing Traffic Classification Using Hybrid Feature Selection
WAIM '08 Proceedings of the 2008 The Ninth International Conference on Web-Age Information Management
Efficient application identification and the temporal and spatial stability of classification schema
Computer Networks: The International Journal of Computer and Telecommunications Networking
GT: picking up the truth from the ground for internet traffic
ACM SIGCOMM Computer Communication Review
On biases in estimating multi-valued attributes
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Feature selection with biased sample distributions
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Combating the Small Sample Class Imbalance Problem Using Feature Selection
IEEE Transactions on Knowledge and Data Engineering
Proceedings of the ACM SIGCOMM 2010 conference
TCFOM: A Robust Traffic Classification Framework Based on OC-SVM Combined with MC-SVM
ICCIIS '10 Proceedings of the 2010 International Conference on Communications and Intelligence Information Security
Internet traffic classification demystified: on the sources of the discriminative power
Proceedings of the 6th International COnference
Estimating continuous distributions in Bayesian classifiers
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Toward the accurate identification of network applications
PAM'05 Proceedings of the 6th international conference on Passive and Active Network Measurement
Bayesian Neural Networks for Internet Traffic Classification
IEEE Transactions on Neural Networks
Traffic classification combining flow correlation and ensemble classifier
International Journal of Wireless and Mobile Computing
Hi-index | 0.24 |
Machine learning (ML) algorithms have been widely applied in recent traffic classification. However, due to the imbalance in the number of traffic flows, ML based classifiers are prone to misclassify flows as the traffic type that occupies the majority of flows on the Internet. To address the problem, a novel feature selection metric named Weighted Symmetrical Uncertainty (WSU) is proposed. We design a hybrid feature selection algorithm named WSU_AUC, which prefilters most of features with WSU metric and further uses a wrapper method to select features for a specific classifier with Area Under roc Curve (AUC) metric. Additionally, to overcome the impacts of dynamic traffic flows on feature selection, we propose an algorithm named SRSF that Selects the Robust and Stable Features from the results achieved by WSU_AUC. We evaluate our approaches using three classifiers on the traces captured from entirely different networks. Experimental results obtained by our algorithms are promising in terms of true positive rate (TPR) and false positive rate (FPR). Moreover, our algorithms can achieve 94% flow accuracy and 80% byte accuracy on average.