Self-learning K-means clustering: a global optimization approach

Authors:
Z. Volkovich;D. Toledano-Kitai;G. -W. Weber
Affiliations:
Ort Braude College of Engineering, Karmiel, Israel 21982;Ort Braude College of Engineering, Karmiel, Israel 21982;Institute of Applied Mathematics, Middle East Technical University, Ankara, Turkey 06531 and University of Siegen, Siegen, Germany and University of Aveiro, Aveiro, Portugal and Universiti Teknolo ...
Venue:
Journal of Global Optimization
Year:
2013

Citing 12
Cited 0

Algorithms for clustering data

Algorithms for clustering data
A Classification EM algorithm for clustering and two stochastic versions

Computational Statistics & Data Analysis - Special issue on optimization techniques in statistics
Identifying genuine clusters in a classification

Computational Statistics & Data Analysis
Concept decompositions for large sparse text data using clustering

Machine Learning
MindReader: Querying Databases Through Multiple Examples

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Text Mining with Information-Theoretic Clustering

Computing in Science and Engineering
Stability-based validation of clustering solutions

Neural Computation
Resampling Method for Unsupervised Estimation of Cluster Validity

Neural Computation
A statistical model of cluster stability

Pattern Recognition
On a Minimal Spanning Tree Approach in the Cluster Validation Problem

Informatica
High Dimensional Inverse Covariance Matrix Estimation via Linear Programming

The Journal of Machine Learning Research
Entropy expressions and their estimators for multivariate distributions

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

An appropriate distance is an essential ingredient in various real-world learning tasks. Distance metric learning proposes to study a metric, which is capable of reflecting the data configuration much better in comparison with the commonly used methods. We offer an algorithm for simultaneous learning the Mahalanobis like distance and K-means clustering aiming to incorporate data rescaling and clustering so that the data separability grows iteratively in the rescaled space with its sequential clustering. At each step of the algorithm execution, a global optimization problem is resolved in order to minimize the cluster distortions resting upon the current cluster configuration. The obtained weight matrix can also be used as a cluster validation characteristic. Namely, closeness of such matrices learned during a sample process can indicate the clusters readiness; i.e. estimates the true number of clusters. Numerical experiments performed on synthetic and on real datasets verify the high reliability of the proposed method.