Speeding-up the kernel k-means clustering method: A prototype based hybrid approach

Authors:
T. Hitendra Sarma;P. Viswanath;B. Eswara Reddy
Affiliations:
Department of Computer Science and Engineering, Srinivasa Ramanujan Institute of Technology, Anantapur 515701, A.P., India;Department of Computer Science and Engineering, Rajeev Gandhi Memorial College of Eng. and Technology, Nandyal 518501, A.P., India;Department of Computer Science and Engineering, JNTUA College of Engineering, Anantapur, 515002, A.P., India
Venue:
Pattern Recognition Letters
Year:
2013

Citing 17
Cited 1

Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Leaders-subleaders: an efficient hierarchical clustering algorithm for large data sets

Pattern Recognition Letters
Kernel k-means: spectral clustering and normalized cuts

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Sketched Symbol Recognition using Zernike Moments

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
A survey of kernel and spectral methods for clustering

Pattern Recognition
Semi-supervised fuzzy clustering: A kernel-based approach

Knowledge-Based Systems
Rough-DBSCAN: A fast hybrid density based clustering method for large data sets

Pattern Recognition Letters
The global kernel k-means algorithm for clustering in feature space

IEEE Transactions on Neural Networks
Speeding-Up the K-Means Clustering Method: A Prototype Based Approach

PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Moments and Moment Invariants in Pattern Recognition

Moments and Moment Invariants in Pattern Recognition
Rapid and brief communication: Evaluation of the performance of clustering algorithms in kernel-induced feature space

Pattern Recognition
A distance based clustering method for arbitrary shaped clusters in large datasets

Pattern Recognition
Approximate kernel k-means: solution to large scale kernel clustering

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks
Mercer kernel-based clustering in feature space

IEEE Transactions on Neural Networks

Multiple instance learning based on positive instance selection and bag structure construction

Pattern Recognition Letters

Quantified Score

Hi-index	0.10

Visualization

Abstract

Kernel k-means clustering method has been proved to be effective in identifying non-isotropic and linearly inseparable clusters in the input space. However, this method is not a suitable one for large datasets because of its quadratic time complexity with respect to the size of the dataset. This paper presents a simple prototype based hybrid approach to speed-up the kernel k-means clustering method for large datasets. The proposed method works in two stages. First, the dataset is partitioned into a number of small grouplets by using the leaders clustering method which takes the size of each grouplet, called the threshold t, as an input parameter. The conventional leaders clustering method is modified such that these grouplets are formed in the kernel induced feature space, but each grouplet is represented by a pattern (called its leader) in the input space. The dataset is re-indexed according to these grouplets. Later, the kernel k-means clustering method is applied over the set of leaders to derive a partition of the leaders set. Finally, each leader is replaced by its group to get a partition of the entire dataset. The time complexity as well as space complexity of the proposed method is O(n+p^2), where p is the number of leaders. The overall running time and the quality of the clustering result depends on the threshold t and the order in which the dataset is scanned. This paper presents a study on how the input parameter t affects the overall running time and the clustering quality obtained by the proposed method. Further, both theoretically and experimentally it has been shown how the order of scanning of the dataset affects the clustering result. The proposed method is also compared with the other recent methods that are proposed to speed-up the kernel k-means clustering method. Experimental study with several real world as well as synthetic datasets shows that, for an appropriate value of t, the proposed method can significantly reduce the computation time but with a small loss in clustering quality, particularly for large datasets.