A divergence-oriented approach for web users clustering

Authors:
Sophia G. Petridou;Vassiliki A. Koutsonikola;Athena I. Vakali;Georgios I. Papadimitriou
Affiliations:
Dept of Informatics Aristotle University, Thessaloniki, Greece;Dept of Informatics Aristotle University, Thessaloniki, Greece;Dept of Informatics Aristotle University, Thessaloniki, Greece;Dept of Informatics Aristotle University, Thessaloniki, Greece
Venue:
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part II
Year:
2006

Citing 9
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering validity checking methods: part II

ACM SIGMOD Record
Enhanced word clustering for hierarchical text classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Creating Adaptive Web Sites Through Usage-Based Clustering of URLs

KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
Web usage mining: discovery and applications of usage patterns from Web data

ACM SIGKDD Explorations Newsletter
A divisive information theoretic feature clustering algorithm for text classification

The Journal of Machine Learning Research
Guest Editors' Introduction: Web Engineering--The Evolution of New Technologies

Computing in Science and Engineering
Cluster Validity Indices for Graph Partitioning

IV '04 Proceedings of the Information Visualisation, Eighth International Conference

The SKM Algorithm: A K-Means Algorithm for Clustering Sequential Data

IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering web users based on their access patterns is a quite significant task in Web Usage Mining. Further to clustering it is important to evaluate the resulted clusters in order to choose the best clustering for a particular framework. This paper examines the usage of Kullback-Leibler divergence, an information theoretic distance, in conjuction with the k-means clustering algorithm. It compares KL-divergence with other well known distance measures (Euclidean, Standardized Euclidean and Manhattan) and evaluates clustering results using both objective function’s value and Davies-Bouldin index. Since it is imperative to assess whether the results of a clustering process are susceptible to noise, especially in noisy environments such as Web environment, our approach takes the impact of noise into account. The clusters obtained with KL approach seem to be superior to those obtained with the other distance measures in case our data have been corrupted by noise.