Link homophily in the application layer and its usage in traffic classification

Authors:
Brian Gallagher;Marios Iliofotou;Tina Eliassi-Rad;Michalis Faloutsos
Affiliations:
Lawrence Livermore National Laboratory;University of California Riverside;Lawrence Livermore National Laboratory;University of California Riverside
Venue:
INFOCOM'10 Proceedings of the 29th conference on Information communications
Year:
2010

Citing 11
Cited 4

Accurate, scalable in-network identification of p2p traffic using application signatures

Proceedings of the 13th international conference on World Wide Web
On the lack of typical behavior in the global Web traffic network

WWW '05 Proceedings of the 14th international conference on World Wide Web
BLINC: multilevel traffic classification in the dark

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
ACAS: automated construction of application signatures

Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
Unexpected means of protocol inference

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Role classification of hosts within enterprise networks based on connection patterns

ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
Classification in Networked Data: A Toolkit and a Univariate Case Study

The Journal of Machine Learning Research
Network monitoring using traffic dispersion graphs (tdgs)

Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Unconstrained endpoint profiling (googling the internet)

Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Internet traffic classification demystified: myths, caveats, and the best practices

CoNEXT '08 Proceedings of the 2008 ACM CoNEXT Conference
A survey of techniques for internet traffic classification using machine learning

IEEE Communications Surveys & Tutorials

Profiling-By-Association: a resilient traffic profiling solution for the internet backbone

Proceedings of the 6th International COnference
Improving matching performance of DPI traffic classifier

Proceedings of the 2011 ACM Symposium on Applied Computing
Discriminating graphs through spectral projections

Computer Networks: The International Journal of Computer and Telecommunications Networking
Detecting malware with graph-based methods: traffic classification, botnets, and facebook scams

Proceedings of the 22nd international conference on World Wide Web companion

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the following questions. Is there link homophily in the application layer traffic? If so, can it be used to accurately classify traffic in network trace data without relying on payloads or properties at the flow level? Our research shows that the answers to both of these questions are affirmative in real network trace data. Specifically, we define link homophily to be the tendency for flows with common IP hosts to have the same application (P2P, Web, etc.) compared to randomly selected flows. The presence of link homophily in trace data provides us with statistical dependencies between flows that share common IP hosts. We utilize these dependencies to classify application layer traffic without relying on payloads or properties at the flow level. In particular, we introduce a new statistical relational learning algorithm, called Neighboring Link Classifier with Relaxation Labeling (NLC+RL). Our algorithm has no training phase and does not require features to be constructed. All that it needs to start the classification process is traffic information on a small portion of the initial flows, which we refer to as seeds. In all our traces, NLC+RL achieves above 90% accuracy with less than 5% seed size; it is robust to errors in the seeds and various seed-selection biases; and it is able to accurately classify challenging traffic such as P2P with over 90% Precision and Recall.