A supervised machine learning approach to classify host roles on line using sFlow

Authors:
Bingdong Li;Mehmet Hadi Gunes;George Bebis;Jeff Springer
Affiliations:
University of Nevada Reno, Reno, NV, USA;University of Nevada Reno, Reno, NV, USA;University of Nevada Reno, Reno, NV, USA;University of Nevada Reno, Reno, NV, USA
Venue:
Proceedings of the first edition workshop on High performance and programmable networking
Year:
2013

Citing 13
Cited 0

An introduction to variable and feature selection

The Journal of Machine Learning Research
BLINC: multilevel traffic classification in the dark

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Behavior-Based Network Security Goes Mainstream

Computer
Fast Kernel Classifiers with Online and Active Learning

The Journal of Machine Learning Research
Role classification of hosts within enterprise networks based on connection patterns

ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
Unconstrained endpoint profiling (googling the internet)

Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Profiling the end host

PAM'07 Proceedings of the 8th international conference on Passive and active network measurement
Unsupervised host behavior classification from connection patterns

International Journal of Network Management
Digging into HTTPS: flow-based classification of webmail traffic

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Nfsight: netflow-based network awareness tool

LISA'10 Proceedings of the 24th international conference on Large installation system administration
Properties and Evolution of Internet Traffic Networks from Anonymized Flow Data

ACM Transactions on Internet Technology (TOIT)
Toward scalable internet traffic measurement and analysis with Hadoop

ACM SIGCOMM Computer Communication Review
Review: A survey of network flow applications

Journal of Network and Computer Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classifying host roles based on network traffic behavior is valuable for network security analysis and detecting security policy violation. Behavior-based network security analysis has advantages over traditional approaches such as code patterns or signatures. Modeling host roles based on network flow data is challenging because of the huge volume of network traffic and overlap among host roles. Many studies of network traffic classification have focused on classifying applications such as web, peer-to-peer, and DNS traffic. In general, machine learning approaches have been applied on classifying applications, security awareness, and anomaly detection. In this paper, we present a supervised machine learning approach that use On-Line Support Vector Machine and Decision Tree to classify host roles. We collect sFlow data from main gateways of a large campus network. We classify different roles, namely, clients versus servers, regular web non-email servers versus web email servers, clients at personal offices versus public places of laboratories and libraries, and personal office clients from two different colleges. We achieved very high classification accuracy, i.e., 99.2% accuracy in classifying clients versus servers, 100% accuracy in classifying regular web non-email servers versus web email servers, 93.3% accuracy in classifying clients at personnel offices versus public places, and 93.3% accuracy in classifying clients at personal offices from two different colleges.