Fast global k-means clustering using cluster membership and inequality

Authors:
Jim Z. C. Lai;Tsung-Jen Huang
Affiliations:
Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung 202, Taiwan;Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung 202, Taiwan and Department of Information and Communications Research Laboratories, Industrial Technology ...
Venue:
Pattern Recognition
Year:
2010

Citing 14
Cited 5

Vector quantization and signal compression

Vector quantization and signal compression
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Accelerating exact k-means algorithms with geometric reasoning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Artifact reduction of JPEG coded images using mean-removed classified vector quantization

Signal Processing
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Web mining for web personalization

ACM Transactions on Internet Technology (TOIT)
Fast k-nearest-neighbor search based on projection and triangular inequality

Pattern Recognition
Fast principal component analysis using fixed-point algorithm

Pattern Recognition Letters
A fast VQ codebook generation algorithm using codeword displacement

Pattern Recognition
Modified global k-means algorithm for minimum sum-of-squares clustering problems

Pattern Recognition
Improvement of the k-means clustering filtering algorithm

Pattern Recognition
Finite-state vector quantization for waveform coding

IEEE Transactions on Information Theory
Optimality of KLT for High-Rate Transform Coding of Gaussian Vector-Scale Mixtures: Application to Reconstruction, Estimation, and Classification

IEEE Transactions on Information Theory
Fast-searching algorithm for vector quantization using projection and triangular inequality

IEEE Transactions on Image Processing

Fast modified global k-means algorithm for incremental cluster construction

Pattern Recognition
An architecture for component-based design of representative-based clustering algorithms

Data & Knowledge Engineering
Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters

International Journal of Information Retrieval Research
Fast global k-means clustering based on local geometrical information

Information Sciences: an International Journal
A feature selection method using fixed-point algorithm for DNA microarray gene expression data

International Journal of Knowledge-based and Intelligent Engineering Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we present a fast global k-means clustering algorithm by making use of the cluster membership and geometrical information of a data point. This algorithm is referred to as MFGKM. The algorithm uses a set of inequalities developed in this paper to determine a starting point for the jth cluster center of global k-means clustering. Adopting multiple cluster center selection (MCS) for MFGKM, we also develop another clustering algorithm called MFGKM+MCS. MCS determines more than one starting point for each step of cluster split; while the available fast and modified global k-means clustering algorithms select one starting point for each cluster split. Our proposed method MFGKM can obtain the least distortion; while MFGKM+MCS may give the least computing time. Compared to the modified global k-means clustering algorithm, our method MFGKM can reduce the computing time and number of distance calculations by a factor of 3.78-5.55 and 21.13-31.41, respectively, with the average distortion reduction of 5,487 for the Statlog data set. Compared to the fast global k-means clustering algorithm, our method MFGKM+MCS can reduce the computing time by a factor of 5.78-8.70 with the average reduction of distortion of 30,564 using the same data set. The performances of our proposed methods are more remarkable when a data set with higher dimension is divided into more clusters.