Approximate kernel k-means: solution to large scale kernel clustering

Authors:
Radha Chitta;Rong Jin;Timothy C. Havens;Anil K. Jain
Affiliations:
Michigan State University, East Lansing, MI, USA;Michigan State University, East Lansing, MI, USA;Michigan State University, East Lansing, MI, USA;Michigan State University, East Lansing, MI, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 29
Cited 4

Incremental clustering for dynamic information processing

ACM Transactions on Information Systems (TOIS)
Integral equations: theory and numerical treatment

Integral equations: theory and numerical treatment
Incremental clustering for very large document databases: initial MARIAN experience

Information Sciences—Informatics and Computer Science: An International Journal
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Large-Scale Parallel Data Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Cure: an efficient clustering algorithm for large databases

Information Systems
CLARANS: A Method for Clustering Objects for Spatial Data Mining

IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Clustering for Data Mining on Multicomputers

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Generalized Representer Theorem

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
A Large Scale Clustering Scheme for Kernel K-Means

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Spectral Grouping Using the Nyström Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Kernel Neural Gas Algorithms with Application to Cluster Analysis

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 4 - Volume 04
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning

The Journal of Machine Learning Research
Google news personalization: scalable online collaborative filtering

Proceedings of the 16th international conference on World Wide Web
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Pervasive parallelism in data mining: dataflow solution to co-clustering large and sparse Netflix data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Spectral Clustering with Random Projection and Sampling

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Rapid and brief communication: Evaluation of the performance of clustering algorithms in kernel-induced feature space

Pattern Recognition
Vlfeat: an open and portable library of computer vision algorithms

Proceedings of the international conference on Multimedia
Segmentation for SAR image based on a new spectral clustering algorithm

LSMS/ICSEE'10 Proceedings of the 2010 international conference on Life system modeling and simulation and intelligent computing, and 2010 international conference on Intelligent computing for sustainable energy and environment: Part III
Mercer kernel-based clustering in feature space

IEEE Transactions on Neural Networks

The kernel semi-least squares method for sparse distance approximation

Neural Computation
Speeding-up the kernel k-means clustering method: A prototype based hybrid approach

Pattern Recognition Letters
Fast and scalable polynomial kernels via explicit feature maps

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Euler clustering

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Digital data explosion mandates the development of scalable tools to organize the data in a meaningful and easily accessible form. Clustering is a commonly used tool for data organization. However, many clustering algorithms designed to handle large data sets assume linear separability of data and hence do not perform well on real world data sets. While kernel-based clustering algorithms can capture the non-linear structure in data, they do not scale well in terms of speed and memory requirements when the number of objects to be clustered exceeds tens of thousands. We propose an approximation scheme for kernel k-means, termed approximate kernel k-means, that reduces both the computational complexity and the memory requirements by employing a randomized approach. We show both analytically and empirically that the performance of approximate kernel k-means is similar to that of the kernel k-means algorithm, but with dramatically reduced run-time complexity and memory requirements.