Algorithms for clustering data
Algorithms for clustering data
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering validity checking methods: part II
ACM SIGMOD Record
Enhanced word clustering for hierarchical text classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Creating Adaptive Web Sites Through Usage-Based Clustering of URLs
KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
Web usage mining: discovery and applications of usage patterns from Web data
ACM SIGKDD Explorations Newsletter
A divisive information theoretic feature clustering algorithm for text classification
The Journal of Machine Learning Research
Guest Editors' Introduction: Web Engineering--The Evolution of New Technologies
Computing in Science and Engineering
Cluster Validity Indices for Graph Partitioning
IV '04 Proceedings of the Information Visualisation, Eighth International Conference
The SKM Algorithm: A K-Means Algorithm for Clustering Sequential Data
IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Hi-index | 0.00 |
Clustering web users based on their access patterns is a quite significant task in Web Usage Mining. Further to clustering it is important to evaluate the resulted clusters in order to choose the best clustering for a particular framework. This paper examines the usage of Kullback-Leibler divergence, an information theoretic distance, in conjuction with the k-means clustering algorithm. It compares KL-divergence with other well known distance measures (Euclidean, Standardized Euclidean and Manhattan) and evaluates clustering results using both objective function’s value and Davies-Bouldin index. Since it is imperative to assess whether the results of a clustering process are susceptible to noise, especially in noisy environments such as Web environment, our approach takes the impact of noise into account. The clusters obtained with KL approach seem to be superior to those obtained with the other distance measures in case our data have been corrupted by noise.