Feature selection for optimizing traffic classification

Authors:
Hongli Zhang;Gang Lu;Mahmoud T. Qassrawi;Yu Zhang;Xiangzhan Yu
Affiliations:
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, PR China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, PR China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, PR China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, PR China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, PR China
Venue:
Computer Communications
Year:
2012

Citing 24
Cited 1

Induction of Decision Trees

Machine Learning
Feature Selection Algorithms: A Survey and Experimental Evaluation

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Accurate, scalable in-network identification of p2p traffic using application signatures

Proceedings of the 13th international conference on World Wide Web
Feature selection for text categorization on imbalanced data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000

Machine Learning
Internet traffic classification using bayesian analysis techniques

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
ACAS: automated construction of application signatures

Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification

ACM SIGCOMM Computer Communication Review
Byte me: a case for byte accuracy in traffic classification

Proceedings of the 3rd annual ACM workshop on Mining network data
Predictive connectionist approach for VoD bandwidth management

Computer Communications
FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing Traffic Classification Using Hybrid Feature Selection

WAIM '08 Proceedings of the 2008 The Ninth International Conference on Web-Age Information Management
Efficient application identification and the temporal and spatial stability of classification schema

Computer Networks: The International Journal of Computer and Telecommunications Networking
GT: picking up the truth from the ground for internet traffic

ACM SIGCOMM Computer Communication Review
On biases in estimating multi-valued attributes

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Feature selection with biased sample distributions

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Combating the Small Sample Class Imbalance Problem Using Feature Selection

IEEE Transactions on Knowledge and Data Engineering
Internet inter-domain traffic

Proceedings of the ACM SIGCOMM 2010 conference
TCFOM: A Robust Traffic Classification Framework Based on OC-SVM Combined with MC-SVM

ICCIIS '10 Proceedings of the 2010 International Conference on Communications and Intelligence Information Security
Internet traffic classification demystified: on the sources of the discriminative power

Proceedings of the 6th International COnference
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Toward the accurate identification of network applications

PAM'05 Proceedings of the 6th international conference on Passive and Active Network Measurement
Bayesian Neural Networks for Internet Traffic Classification

IEEE Transactions on Neural Networks

Traffic classification combining flow correlation and ensemble classifier

International Journal of Wireless and Mobile Computing

Quantified Score

Hi-index	0.24

Visualization

Abstract

Machine learning (ML) algorithms have been widely applied in recent traffic classification. However, due to the imbalance in the number of traffic flows, ML based classifiers are prone to misclassify flows as the traffic type that occupies the majority of flows on the Internet. To address the problem, a novel feature selection metric named Weighted Symmetrical Uncertainty (WSU) is proposed. We design a hybrid feature selection algorithm named WSU_AUC, which prefilters most of features with WSU metric and further uses a wrapper method to select features for a specific classifier with Area Under roc Curve (AUC) metric. Additionally, to overcome the impacts of dynamic traffic flows on feature selection, we propose an algorithm named SRSF that Selects the Robust and Stable Features from the results achieved by WSU_AUC. We evaluate our approaches using three classifiers on the traces captured from entirely different networks. Experimental results obtained by our algorithms are promising in terms of true positive rate (TPR) and false positive rate (FPR). Moreover, our algorithms can achieve 94% flow accuracy and 80% byte accuracy on average.