Offline/realtime traffic classification using semi-supervised learning

Authors:
Jeffrey Erman;Anirban Mahanti;Martin Arlitt;Ira Cohen;Carey Williamson
Affiliations:
Department of Computer Science, University of Calgary, Canada;Department of Computer Science and Engineering, Indian Institute of Technology, Delhi, India;Department of Computer Science, University of Calgary, Canada and Enterprise Systems and Software Lab, HP Labs, Palo Alto, USA;Enterprise Systems and Software Lab, HP Labs, Palo Alto, USA;Department of Computer Science, University of Calgary, Canada
Venue:
Performance Evaluation
Year:
2007

Citing 26
Cited 29

Empirically derived analytic models of wide-area TCP connections

IEEE/ACM Transactions on Networking (TON)
Bro: a system for detecting network intruders in real-time

Computer Networks: The International Journal of Computer and Telecommunications Networking
An introduction to variable and feature selection

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Analyzing peer-to-peer traffic across large networks

IEEE/ACM Transactions on Networking (TON)
Accurate, scalable in-network identification of p2p traffic using application signatures

Proceedings of the 13th international conference on World Wide Web
Flow sampling under hard resource constraints

Proceedings of the joint international conference on Measurement and modeling of computer systems
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Building a better NetFlow

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Identifying elephant flows through periodically sampled packets

Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Transport layer identification of P2P traffic

Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification

Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Internet traffic classification using bayesian analysis techniques

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Profiling internet backbone traffic: behavior models and applications

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
BLINC: multilevel traffic classification in the dark

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
ACAS: automated construction of application signatures

Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
Automated Traffic Classification and Application Identification using Machine Learning

LCN '05 Proceedings of the The IEEE Conference on Local Computer Networks 30th Anniversary
Traffic classification using clustering algorithms

Proceedings of the 2006 SIGCOMM workshop on Mining network data
A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification

ACM SIGCOMM Computer Communication Review
Unexpected means of protocol inference

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Traffic classification through simple statistical fingerprinting

ACM SIGCOMM Computer Communication Review
Identifying and discriminating between web and peer-to-peer traffic in the network core

Proceedings of the 16th international conference on World Wide Web
The power of slicing in internet flow measurement

IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Byte me: a case for byte accuracy in traffic classification

Proceedings of the 3rd annual ACM workshop on Mining network data
Early application identification

CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Toward the accurate identification of network applications

PAM'05 Proceedings of the 6th international conference on Passive and Active Network Measurement

A comparative analysis of web and peer-to-peer traffic

Proceedings of the 17th international conference on World Wide Web
Online hybrid traffic classifier for Peer-to-Peer systems based on network processors

Applied Soft Computing
Profiling and identification of P2P traffic

Computer Networks: The International Journal of Computer and Telecommunications Networking
Semi-supervised co-training and active learning based approach for multi-view intrusion detection

Proceedings of the 2009 ACM symposium on Applied Computing
BotCop: An Online Botnet Traffic Classifier

CNSR '09 Proceedings of the 2009 Seventh Annual Communication Networks and Services Research Conference
Online Classification of Network Flows

CNSR '09 Proceedings of the 2009 Seventh Annual Communication Networks and Services Research Conference
A novel semi-supervised fuzzy C-means clustering method

CCDC'09 Proceedings of the 21st annual international conference on Chinese control and decision conference
Composite lightweight traffic classification system for network management

International Journal of Network Management
Optimizing statistical classifiers of network traffic

Proceedings of the 6th International Wireless Communications and Mobile Computing Conference
Internet traffic classification demystified: on the sources of the discriminative power

Proceedings of the 6th International COnference
Clustering botnet communication traffic based on n-gram feature selection

Computer Communications
A VoIP Traffic Identification Scheme Based on Host and Flow Behavior Analysis

Journal of Network and Systems Management
Analysis of the impact of sampling on NetFlow traffic classification

Computer Networks: The International Journal of Computer and Telecommunications Networking
KISS: stochastic packet inspection classifier for UDP traffic

IEEE/ACM Transactions on Networking (TON)
Host-Based P2P Flow Identification and Use in Real-Time

ACM Transactions on the Web (TWEB)
Inferring users' online activities through traffic analysis

Proceedings of the fourth ACM conference on Wireless network security
SMILER: Towards Practical Online Traffic Classification

Proceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems
Enhancing redundant network traffic elimination

Computer Networks: The International Journal of Computer and Telecommunications Networking
A Modular Machine Learning System for Flow-Level Traffic Classification in Large Networks

ACM Transactions on Knowledge Discovery from Data (TKDD)
Efficient semi-supervised learning bittorrent traffic detection - an extended summary

ICDCN'12 Proceedings of the 13th international conference on Distributed Computing and Networking
PolyCert: polymorphic self-optimizing replication for in-memory transactional grids

Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Challenges in network application identification

LEET'12 Proceedings of the 5th USENIX conference on Large-Scale Exploits and Emergent Threats
UARA in edge routers: an effective approach to user fairness and traffic shaping

International Journal of Communication Systems
Iterative resource pooling for bandwidth allocation in TDM-PON: algorithm, convergence and experimental evaluation

Photonic Network Communications
Exploiting packet-sampling measurements for traffic characterization and classification

International Journal of Network Management
PolyCert: polymorphic self-optimizing replication for in-memory transactional grids

Proceedings of the 12th International Middleware Conference
Detection and classification of peer-to-peer traffic: A survey

ACM Computing Surveys (CSUR)
Robust network traffic identification with unknown applications

Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security
Toward an efficient and scalable feature selection approach for internet traffic classification

Computer Networks: The International Journal of Computer and Telecommunications Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifying and categorizing network traffic by application type is challenging because of the continued evolution of applications, especially of those with a desire to be undetectable. The diminished effectiveness of port-based identification and the overheads of deep packet inspection approaches motivate us to classify traffic by exploiting distinctive flow characteristics of applications when they communicate on a network. In this paper, we explore this latter approach and propose a semi-supervised classification method that can accommodate both known and unknown applications. To the best of our knowledge, this is the first work to use semi-supervised learning techniques for the traffic classification problem. Our approach allows classifiers to be designed from training data that consists of only a few labeled and many unlabeled flows. We consider pragmatic classification issues such as longevity of classifiers and the need for retraining of classifiers. Our performance evaluation using empirical Internet traffic traces that span a 6-month period shows that: (1) high flow and byte classification accuracy (i.e., greater than 90%) can be achieved using training data that consists of a small number of labeled and a large number of unlabeled flows; (2) presence of ''mice'' and ''elephant'' flows in the Internet complicates the design of classifiers, especially of those with high byte accuracy, and necessitates the use of weighted sampling techniques to obtain training flows; and (3) retraining of classifiers is necessary only when there are non-transient changes in the network usage characteristics. As a proof of concept, we implement prototype offline and realtime classification systems to demonstrate the feasibility of our approach.