Learning on class imbalanced data to classify peer-to-peer applications in IP traffic using resampling techniques

Authors:
Weicai Zhong;Bijan Raahemi;Jing Liu
Affiliations:
Telfer School of Management, University of Ottawa, Ottawa, ON, Canada;Telfer School of Management, University of Ottawa, Ottawa, ON, Canada;Institute of Intelligent Information Processing, Xidian University, Xi'an, Shaanxi, China
Venue:
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Year:
2009

Citing 13
Cited 1

Accurate, scalable in-network identification of p2p traffic using application signatures

Proceedings of the 13th international conference on World Wide Web
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Transport layer identification of P2P traffic

Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Using AUC and Accuracy in Evaluating Learning Algorithms

IEEE Transactions on Knowledge and Data Engineering
Internet traffic classification using bayesian analysis techniques

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Internet Measurement: Infrastructure, Traffic and Applications

Internet Measurement: Infrastructure, Traffic and Applications
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
The class imbalance problem: A systematic study

Intelligent Data Analysis
Peer-to-Peer Traffic Identification by Mining IP Layer Data Streams Using Concept-Adapting Very Fast Decision Tree

ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 01
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Traffic classification using a statistical approach

PAM'05 Proceedings of the 6th international conference on Passive and Active Network Measurement
Self-Learning IP traffic classification based on statistical flow characteristics

PAM'05 Proceedings of the 6th international conference on Passive and Active Network Measurement
Bayesian Neural Networks for Internet Traffic Classification

IEEE Transactions on Neural Networks

Genetic-based minimum classification error mapping for accurate identifying Peer-to-Peer applications in the internet traffic

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many applications, one class of data is presented by a large number of examples while the other only by a few. For instance, in our previous works on identification of peer-to-peer (P2P) Internet traffics, we observed that only about 30% of examples can be labeled as "P2P" using a port-based heuristic rule, and even fewer examples can be labeled in the future as more and more P2P applications use dynamic ports. In this paper, the effect of three resampling techniques on balancing the class distribution in training C4.5 and neural networks for identifying P2P traffic is studied. The experimental data were captured at our campus gateway. Nine datasets with different percentages of "P2P" examples and six datasets of different sizes with an actual percentage of about 30% of"P2P" examples are used in the experiments. The results show that resampling techniques are effective and stable, and random over-sampling is a quite good choice for P2P traffic identification considering a combination of the classification performance and time complexity.