A new information theoretic analysis of sum-of-squared-error kernel clustering

Authors:
Robert Jenssen;Torbjørn Eltoft
Affiliations:
Department of Physics and Technology, University of Tromsø, N-9037 Tromsø, Norway;Department of Physics and Technology, University of Tromsø, N-9037 Tromsø, Norway
Venue:
Neurocomputing
Year:
2008

Citing 8
Cited 3

Support-Vector Networks

Machine Learning
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Data Mining and Knowledge Discovery
Kernel independent component analysis

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks
Mercer kernel-based clustering in feature space

IEEE Transactions on Neural Networks

Some properties of Rényi entropy and Rényi entropy rate

Information Sciences: an International Journal
Rényi entropy rate for Gaussian processes

Information Sciences: an International Journal
A novel ant-based clustering algorithm using Renyi entropy

Applied Soft Computing

Quantified Score

Hi-index	0.02

Visualization

Abstract

The contribution of this paper is to provide a new input space analysis of the properties of sum-of-squared-error K-means clustering performed in a Mercer kernel feature space. Such an analysis has been missing until now, even though kernel K-means has been popular in the clustering literature. Our derivation extends the theory of traditional K-means from properties of mean vectors to information theoretic properties of Parzen window estimated probability density functions (pdfs). In particular, Euclidean distance-based kernel K-means is shown to maximize an integrated squared error divergence measure between cluster pdfs and the overall pdf of the data, while a cosine similarity-based approach maximizes a Cauchy-Schwarz divergence measure. Furthermore, the iterative rules which assign data points to clusters in order to maximize these criteria are shown to depend on the cluster pdfs evaluated at the data points, in addition to the Renyi entropies of the clusters. The Bayes rule is shown to be a special case.