Accurate, scalable in-network identification of p2p traffic using application signatures
Proceedings of the 13th international conference on World Wide Web
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Transport layer identification of P2P traffic
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Using AUC and Accuracy in Evaluating Learning Algorithms
IEEE Transactions on Knowledge and Data Engineering
Internet traffic classification using bayesian analysis techniques
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Internet Measurement: Infrastructure, Traffic and Applications
Internet Measurement: Infrastructure, Traffic and Applications
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
The class imbalance problem: A systematic study
Intelligent Data Analysis
ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 01
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Traffic classification using a statistical approach
PAM'05 Proceedings of the 6th international conference on Passive and Active Network Measurement
Self-Learning IP traffic classification based on statistical flow characteristics
PAM'05 Proceedings of the 6th international conference on Passive and Active Network Measurement
Bayesian Neural Networks for Internet Traffic Classification
IEEE Transactions on Neural Networks
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
In many applications, one class of data is presented by a large number of examples while the other only by a few. For instance, in our previous works on identification of peer-to-peer (P2P) Internet traffics, we observed that only about 30% of examples can be labeled as "P2P" using a port-based heuristic rule, and even fewer examples can be labeled in the future as more and more P2P applications use dynamic ports. In this paper, the effect of three resampling techniques on balancing the class distribution in training C4.5 and neural networks for identifying P2P traffic is studied. The experimental data were captured at our campus gateway. Nine datasets with different percentages of "P2P" examples and six datasets of different sizes with an actual percentage of about 30% of"P2P" examples are used in the experiments. The results show that resampling techniques are effective and stable, and random over-sampling is a quite good choice for P2P traffic identification considering a combination of the classification performance and time complexity.