On network-aware clustering of Web clients
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Data mining: concepts and techniques
Data mining: concepts and techniques
New directions in traffic measurement and accounting
IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
Data Mining: Introductory and Advanced Topics
Data Mining: Introductory and Advanced Topics
Data Mining and Knowledge Discovery with Evolutionary Algorithms
Data Mining and Knowledge Discovery with Evolutionary Algorithms
Automatically inferring patterns of resource consumption in network traffic
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Depth-First K-Nearest Neighbor Finding Using the MaxNearestDist Estimator
ICIAP '03 Proceedings of the 12th International Conference on Image Analysis and Processing
The application of nearest neighbor algorithm on creating an adaptive on-line learning system
FIE '01 Proceedings of the Frontiers in Education Conference, 2001. on 31st Annual - Volume 01
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A framework for scalable distributed provenance storage system
Computer Standards & Interfaces
Hi-index | 0.00 |
Large quantities of network traffic flow data are generated on university campus networks. These data contain information on the sources and destinations of individual flows encoded as IP addresses. The cluster analysis of such data can reveal useful knowledge for web cache designing, user profiling, and network resource management. However, popular clustering algorithms such as k-means and DBSCAN are not directly applicable to datasets containing IP addresses. Moreover, such generic algorithms can yield results that are difficult to interpret.This paper presents the cluster analysis of network traffic flows using a hybrid clustering algorithm. The algorithm integrates the longest prefix matching concept of TCP/IP traffic routing and the nearest neighbor algorithm. The similarity between IP addresses is determined by the longest prefix match. Similar IP addresses are then grouped together by an adapted version of the nearest neighbor algorithm. The algorithm provides automatic clustering that does not require input parameters such as the desired number of clusters and similarity threshold value. Furthermore, the algorithm yields 'natural' clusters consistent with the characteristics and usage of IP addresses. The test results are verified using nslookup. About 90% of the clusters were correctly identified by the algorithm.