Cluster analysis of traffic flows on a campus network

Authors:
Asim Karim;Irfan Ahmad;Syed Imran Jami;Mansoor Sarwar
Affiliations:
Dept. of Computer Science, Lahore University of Management Sciences, Lahore, Pakistan;School of Science and Technology, University of Management and Technology, Gulberg III, Lahore, Pakistan;Dept. of Computer Science, NUCES (FAST), Shah Latif Town, Karachi, Pakistan;School of Science and Technology, University of Management and Technology, Gulberg III, Lahore, Pakistan
Venue:
AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Year:
2006

Citing 9
Cited 1

On network-aware clustering of Web clients

Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Data mining: concepts and techniques

Data mining: concepts and techniques
New directions in traffic measurement and accounting

IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
Data Mining: Introductory and Advanced Topics

Data Mining: Introductory and Advanced Topics
Data Mining and Knowledge Discovery with Evolutionary Algorithms

Data Mining and Knowledge Discovery with Evolutionary Algorithms
Automatically inferring patterns of resource consumption in network traffic

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Depth-First K-Nearest Neighbor Finding Using the MaxNearestDist Estimator

ICIAP '03 Proceedings of the 12th International Conference on Image Analysis and Processing
The application of nearest neighbor algorithm on creating an adaptive on-line learning system

FIE '01 Proceedings of the Frontiers in Education Conference, 2001. on 31st Annual - Volume 01
Genetic K-means algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

A framework for scalable distributed provenance storage system

Computer Standards & Interfaces

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large quantities of network traffic flow data are generated on university campus networks. These data contain information on the sources and destinations of individual flows encoded as IP addresses. The cluster analysis of such data can reveal useful knowledge for web cache designing, user profiling, and network resource management. However, popular clustering algorithms such as k-means and DBSCAN are not directly applicable to datasets containing IP addresses. Moreover, such generic algorithms can yield results that are difficult to interpret.This paper presents the cluster analysis of network traffic flows using a hybrid clustering algorithm. The algorithm integrates the longest prefix matching concept of TCP/IP traffic routing and the nearest neighbor algorithm. The similarity between IP addresses is determined by the longest prefix match. Similar IP addresses are then grouped together by an adapted version of the nearest neighbor algorithm. The algorithm provides automatic clustering that does not require input parameters such as the desired number of clusters and similarity threshold value. Furthermore, the algorithm yields 'natural' clusters consistent with the characteristics and usage of IP addresses. The test results are verified using nslookup. About 90% of the clusters were correctly identified by the algorithm.