Algorithms for clustering data
Algorithms for clustering data
A Classification EM algorithm for clustering and two stochastic versions
Computational Statistics & Data Analysis - Special issue on optimization techniques in statistics
Identifying genuine clusters in a classification
Computational Statistics & Data Analysis
Concept decompositions for large sparse text data using clustering
Machine Learning
MindReader: Querying Databases Through Multiple Examples
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Text Mining with Information-Theoretic Clustering
Computing in Science and Engineering
Stability-based validation of clustering solutions
Neural Computation
Resampling Method for Unsupervised Estimation of Cluster Validity
Neural Computation
A statistical model of cluster stability
Pattern Recognition
High Dimensional Inverse Covariance Matrix Estimation via Linear Programming
The Journal of Machine Learning Research
Entropy expressions and their estimators for multivariate distributions
IEEE Transactions on Information Theory
Hi-index | 0.00 |
An appropriate distance is an essential ingredient in various real-world learning tasks. Distance metric learning proposes to study a metric, which is capable of reflecting the data configuration much better in comparison with the commonly used methods. We offer an algorithm for simultaneous learning the Mahalanobis like distance and K-means clustering aiming to incorporate data rescaling and clustering so that the data separability grows iteratively in the rescaled space with its sequential clustering. At each step of the algorithm execution, a global optimization problem is resolved in order to minimize the cluster distortions resting upon the current cluster configuration. The obtained weight matrix can also be used as a cluster validation characteristic. Namely, closeness of such matrices learned during a sample process can indicate the clusters readiness; i.e. estimates the true number of clusters. Numerical experiments performed on synthetic and on real datasets verify the high reliability of the proposed method.