Nonlinear component analysis as a kernel eigenvalue problem
Neural Computation
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Clustering Incomplete Data Using Kernel-Based Fuzzy C-means Algorithm
Neural Processing Letters
A probabilistic framework for semi-supervised clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised graph clustering: a kernel approach
ICML '05 Proceedings of the 22nd international conference on Machine learning
Neural Networks - 2006 Special issue: Advances in self-organizing maps--WSOM'05
Data clustering: 50 years beyond K-means
Pattern Recognition Letters
Parallelized kernel patch clustering
ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
Semi-Supervised kernel clustering with sample-to-cluster weights
PSL'11 Proceedings of the First IAPR TC3 conference on Partially Supervised Learning
Hi-index | 0.10 |
Labelling real world data sets is a difficult problem. Often, the human expert is unsure about a class label of a specific sample point or, in case of very large data sets, it is impractical to label them manually. In semi-supervised clustering, the sample labels, which are external informations, are used to find better matching cluster partitions. Further, kernel-based clustering methods are able to partition the data with nonlinear boundaries in feature space. While these methods improve the clustering results, they have a quadratic computation time. In this paper, we propose a meta-algorithm that processes small-sized subsets of a large data set, clusters them with the sample labels and merges the points close to the resulting prototypes with the next points, until the whole data set has been processed. It has a linear computation time. The error function that this meta-algorithm minimizes is presented. Although we applied this meta-algorithm to Kernel Fuzzy C-Means, Relational Neural Gas and Kernel K-Means, it can be applied to a broad range of kernel-based clustering methods. The proposed method has been empirically evaluated on two real world benchmark data sets.