A time-efficient pattern reduction algorithm for k-means clustering

Authors:
Ming-Chao Chiang;Chun-Wei Tsai;Chu-Sing Yang
Affiliations:
Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, Taiwan, ROC;Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, Taiwan, ROC and Department of Electrical Engineering, National Cheng Kung University, Tainan 70101 ...;Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan, ROC
Venue:
Information Sciences: an International Journal
Year:
2011

Citing 44
Cited 11

Algorithms for clustering data

Algorithms for clustering data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Data clustering: a review

ACM Computing Surveys (CSUR)
Scalability for clustering algorithms revisited

ACM SIGKDD Explorations Newsletter
Two-phase clustering process for outliers detection

Pattern Recognition Letters
Evaluating document clustering for interactive information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Integration of self-organizing feature map and K-means algorithm for market segmentation

Computers and Operations Research
CLARANS: A Method for Clustering Objects for Spatial Data Mining

IEEE Transactions on Knowledge and Data Engineering
An evolutionary technique based on K-means algorithm for optimal clustering in RN

Information Sciences—Applications: An International Journal
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
SVM-KM: Speeding SVMs Learning with a priori Cluster Selection and k-Means

SBRN '00 Proceedings of the VI Brazilian Symposium on Neural Networks (SBRN'00)
A Large Scale Clustering Scheme for Kernel K-Means

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Face recognition: A literature survey

ACM Computing Surveys (CSUR)
FGKA: a Fast Genetic K-means Clustering Algorithm

Proceedings of the 2004 ACM symposium on Applied computing
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient Disk-Based K-Means Clustering for Relational Databases

IEEE Transactions on Knowledge and Data Engineering
Kernel k-means: spectral clustering and normalized cuts

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Phrase-Based Document Indexing for Web Document Clustering

IEEE Transactions on Knowledge and Data Engineering
A Novel Kernel Method for Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
A personalized search engine based on web-snippet hierarchical clustering

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
A Genetic Algorithm Using Hyper-Quadtrees for Low-Dimensional K-means Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Parallel bisecting k-means with prediction clustering algorithm

The Journal of Supercomputing
Introduction to Clustering Large and High-Dimensional Data

Introduction to Clustering Large and High-Dimensional Data
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)

Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)
A tabu search approach for the minimum sum-of-squares clustering problem

Information Sciences: an International Journal
Aggregation pheromone density based data clustering

Information Sciences: an International Journal
Clustering high dimensional data: A graph-based relaxed optimization approach

Information Sciences: an International Journal
Clustering

Clustering
Immune K-means and negative selection algorithms for data analysis

Information Sciences: an International Journal
RK-Means Clustering: K-Means with Reliability

IEICE - Transactions on Information and Systems
Performance evaluation of density-based clustering methods

Information Sciences: an International Journal
Application of ant K-means on clustering analysis

Computers & Mathematics with Applications
Genetic K-means algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Fast accurate fuzzy clustering through data reduction

IEEE Transactions on Fuzzy Systems
On cluster validity for the fuzzy c-means model

IEEE Transactions on Fuzzy Systems
Clustering of the self-organizing map

IEEE Transactions on Neural Networks
Fast self-organizing feature map algorithm

IEEE Transactions on Neural Networks
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Minimum spanning tree based split-and-merge: A hierarchical clustering method

Information Sciences: an International Journal
Partitive clustering (K-means family)

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Continuous space pattern reduction for genetic clustering algorithm

Proceedings of the 14th annual conference companion on Genetic and evolutionary computation
Efficient stochastic algorithms for document clustering

Information Sciences: an International Journal
Optimal clustering in the context of overlapping cluster analysis

Information Sciences: an International Journal
Low Dimensional Data Privacy Preservation Using Multi Layer Artificial Neural Network

International Journal of Intelligent Information Technologies
Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters

International Journal of Information Retrieval Research
Evaluation of a perturbation-based technique for privacy preservation in a multi-party clustering scenario

Information Sciences: an International Journal
The alpha parallelogram predictor: A lossless compression method for motion capture data

Information Sciences: an International Journal
PREACO: A fast ant colony optimization for codebook generation

Applied Soft Computing
Data clustering using controlled consensus in complex networks

Neurocomputing

Quantified Score

Hi-index	0.07

Visualization

Abstract

This paper presents an efficient algorithm, called pattern reduction (PR), for reducing the computation time of k-means and k-means-based clustering algorithms. The proposed algorithm works by compressing and removing at each iteration patterns that are unlikely to change their membership thereafter. Not only is the proposed algorithm simple and easy to implement, but it can also be applied to many other iterative clustering algorithms such as kernel-based and population-based clustering algorithms. Our experiments-from 2 to 1000 dimensions and 150 to 10,000,000 patterns-indicate that with a small loss of quality, the proposed algorithm can significantly reduce the computation time of all state-of-the-art clustering algorithms evaluated in this paper, especially for large and high-dimensional data sets.