Data Driven Similarity Measures for k-Means Like Clustering Algorithms

Authors:
Jacob Kogan;Marc Teboulle;Charles Nicholas
Affiliations:
Department of Mathematics and Statistics, UMBC, Baltimore 21250;School of Mathematical Sciences, Tel-Aviv University, Tel-Aviv, Israel;Department of Computer Science and Electrical Engineering, UMBC, Baltimore 21250
Venue:
Information Retrieval
Year:
2005

Citing 7
Cited 1

Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
Interior Proximal and Multiplier Methods Based on Second Order Homogeneous Kernels

Mathematics of Operations Research
Concept decompositions for large sparse text data using clustering

Machine Learning
Convergence of Proximal-Like Algorithms

SIAM Journal on Optimization
Clustering large unstructured document sets

Computational information retrieval
Enhanced word clustering for hierarchical text classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)

A clustering scheme for large high-dimensional document datasets

ISICA'07 Proceedings of the 2nd international conference on Advances in computation and intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an optimization approach that generates k-means like clustering algorithms. The batch k-means and the incremental k-means are two well known versions of the classical k-means clustering algorithm (Duda et al. 2000). To benefit from the speed of the batch version and the accuracy of the incremental version we combine the two in a "ping--pong" fashion. We use a distance-like function that combines the squared Euclidean distance with relative entropy. In the extreme cases our algorithm recovers the classical k-means clustering algorithm and generalizes the Divisive Information Theoretic clustering algorithm recently reported independently by Berkhin and Becher (2002) and Dhillon1 et al. (2002). Results of numerical experiments that demonstrate the viability of our approach are reported.