Semi-supervised clustering of large data sets with kernel methods

Authors:
Stefan Fauíer;Friedhelm Schwenker
Affiliations:
-;-
Venue:
Pattern Recognition Letters
Year:
2014

Citing 11
Cited 0

Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Clustering Incomplete Data Using Kernel-Based Fuzzy C-means Algorithm

Neural Processing Letters
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised graph clustering: a kernel approach

ICML '05 Proceedings of the 22nd international conference on Machine learning
Batch and median neural gas

Neural Networks - 2006 Special issue: Advances in self-organizing maps--WSOM'05
Data clustering: 50 years beyond K-means

Pattern Recognition Letters
Parallelized kernel patch clustering

ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
Semi-Supervised kernel clustering with sample-to-cluster weights

PSL'11 Proceedings of the First IAPR TC3 conference on Partially Supervised Learning

Quantified Score

Hi-index	0.10

Visualization

Abstract

Labelling real world data sets is a difficult problem. Often, the human expert is unsure about a class label of a specific sample point or, in case of very large data sets, it is impractical to label them manually. In semi-supervised clustering, the sample labels, which are external informations, are used to find better matching cluster partitions. Further, kernel-based clustering methods are able to partition the data with nonlinear boundaries in feature space. While these methods improve the clustering results, they have a quadratic computation time. In this paper, we propose a meta-algorithm that processes small-sized subsets of a large data set, clusters them with the sample labels and merges the points close to the resulting prototypes with the next points, until the whole data set has been processed. It has a linear computation time. The error function that this meta-algorithm minimizes is presented. Although we applied this meta-algorithm to Kernel Fuzzy C-Means, Relational Neural Gas and Kernel K-Means, it can be applied to a broad range of kernel-based clustering methods. The proposed method has been empirically evaluated on two real world benchmark data sets.