Semi-supervised clustering using heterogeneous dissimilarities

Authors:
Manuel Martín-Merino
Affiliations:
Universidad Pontificia de Salamanca, Salamanca, Spain
Venue:
SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
Year:
2010

Citing 7
Cited 0

A generalized kernel approach to dissimilarity-based classification

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Learning the Kernel Matrix with Semidefinite Programming

The Journal of Machine Learning Research
Formulating distance functions via the kernel trick

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Learning a Mahalanobis Metric from Equivalence Constraints

The Journal of Machine Learning Research
Learning the Kernel with Hyperkernels

The Journal of Machine Learning Research
Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of many clustering algorithms such as kmeans depends strongly on the dissimilarity considered to evaluate the sample proximities. The choice of a good dissimilarity is a difficult task because each dissimilarity reflects different features of the data. Therefore, different dissimilarities should be integrated in order to reflect more accurately which is similar for the user and the problem at hand. In many applications, the user feedback or the a priory knowledge about the problem provide pairs of similar and dissimilar examples. This side-information may be used to learn a distance metric and to improve the clustering results. In this paper, we address the problem of learning a linear combination of dissimilarities using side information in the form of equivalence constraints. The minimization of the error function is based on a quadratic optimization algorithm. A smoothing term is included that penalizes the complexity of the family of distances and avoids overfitting. The experimental results suggest that the method proposed outperforms a standard metric learning algorithm and improves the classical k-means clustering based on a single dissimilarity.