Nonlinear component analysis as a kernel eigenvalue problem
Neural Computation
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Leaders-subleaders: an efficient hierarchical clustering algorithm for large data sets
Pattern Recognition Letters
Kernel k-means: spectral clustering and normalized cuts
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Sketched Symbol Recognition using Zernike Moments
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
A survey of kernel and spectral methods for clustering
Pattern Recognition
Semi-supervised fuzzy clustering: A kernel-based approach
Knowledge-Based Systems
Rough-DBSCAN: A fast hybrid density based clustering method for large data sets
Pattern Recognition Letters
The global kernel k-means algorithm for clustering in feature space
IEEE Transactions on Neural Networks
Speeding-Up the K-Means Clustering Method: A Prototype Based Approach
PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Moments and Moment Invariants in Pattern Recognition
Moments and Moment Invariants in Pattern Recognition
Approximate kernel k-means: solution to large scale kernel clustering
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
An introduction to kernel-based learning algorithms
IEEE Transactions on Neural Networks
Mercer kernel-based clustering in feature space
IEEE Transactions on Neural Networks
Multiple instance learning based on positive instance selection and bag structure construction
Pattern Recognition Letters
Hi-index | 0.10 |
Kernel k-means clustering method has been proved to be effective in identifying non-isotropic and linearly inseparable clusters in the input space. However, this method is not a suitable one for large datasets because of its quadratic time complexity with respect to the size of the dataset. This paper presents a simple prototype based hybrid approach to speed-up the kernel k-means clustering method for large datasets. The proposed method works in two stages. First, the dataset is partitioned into a number of small grouplets by using the leaders clustering method which takes the size of each grouplet, called the threshold t, as an input parameter. The conventional leaders clustering method is modified such that these grouplets are formed in the kernel induced feature space, but each grouplet is represented by a pattern (called its leader) in the input space. The dataset is re-indexed according to these grouplets. Later, the kernel k-means clustering method is applied over the set of leaders to derive a partition of the leaders set. Finally, each leader is replaced by its group to get a partition of the entire dataset. The time complexity as well as space complexity of the proposed method is O(n+p^2), where p is the number of leaders. The overall running time and the quality of the clustering result depends on the threshold t and the order in which the dataset is scanned. This paper presents a study on how the input parameter t affects the overall running time and the clustering quality obtained by the proposed method. Further, both theoretically and experimentally it has been shown how the order of scanning of the dataset affects the clustering result. The proposed method is also compared with the other recent methods that are proposed to speed-up the kernel k-means clustering method. Experimental study with several real world as well as synthetic datasets shows that, for an appropriate value of t, the proposed method can significantly reduce the computation time but with a small loss in clustering quality, particularly for large datasets.