MPM: a hierarchical clustering algorithm using matrix partitioning method for non-numeric data
Journal of Intelligent Information Systems
Evaluation of Text Clustering Algorithms with N-Gram-Based Document Fingerprints
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Discovering Knowledge-Sharing Communities in Question-Answering Forums
ACM Transactions on Knowledge Discovery from Data (TKDD)
A practical approach for clustering transaction data
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Hi-index | 0.00 |
We present a partitioning method able to manage web log sessions. Sessions are assimilable to transactions, i.e., tuples of variable size of categorical data. We adapt the standard definition of mathematical distance used in the K-Means algorithm to represent transactions dissimilarity, and redefine the notion of cluster centroid. The cluster centroid is used as the representative of the common properties of cluster elements. We show that using our concept of cluster centroid together with Jaccard distance we obtain results that are comparable with standard approaches, but substantially improve their efficiency.