P2P traffic classification using ensemble learning

Authors:
Jagan Mohan Reddy;Chittaranjan Hota
Affiliations:
Birla Institute of Technology and Science-Pilani, A.P., India;Birla Institute of Technology and Science-Pilani, A.P., India
Venue:
Proceedings of the 5th IBM Collaborative Academia Research Exchange Workshop
Year:
2013

Citing 8
Cited 0

Consistency-based search in feature selection

Artificial Intelligence
A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000

Machine Learning
BLINC: multilevel traffic classification in the dark

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Adaptive ensemble classification in p2p networks

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
A survey of techniques for internet traffic classification using machine learning

IEEE Communications Surveys & Tutorials
Feature selection for detection of peer-to-peer botnet traffic

Proceedings of the 6th ACM India Computing Convention
PeerRush: mining for unwanted p2p traffic

DIMVA'13 Proceedings of the 10th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Early Peer-to-Peer overlay network traffic classification schemes were based on port-based and payload based inspection. In recent years researchers have focused on alternate machine learning approaches. This paper presents ensemble learning which combines multiple models to improve prediction accuracy over a single classifier or semi-supervised learning techniques. In this paper, statistical characteristics of TCP and UDP flows are extracted from the network traces to construct a feature set first. We then apply feature selection techniques to reduce the number of features required to train the model, hence reducing the build time. We used Stacking and Voting ensemble learning techniques to improve prediction accuracy with base classifiers modelled using Machine Learning (ML) algorithms: Naïve Bayes classifier, Bayesian Network, Decision trees. We used meta classifiers to further improve classification accuracy to 99.9%. Our experimental results show that Stacking perform better over Voting in identifying P2P traffic.