A general grid-clustering approach

Authors:
Shihong Yue;Miaomiao Wei;Jeen-Shing Wang;Huaxiang Wang
Affiliations:
School of Electrical Engineering and Automation, Tianjin University, Tianjin 300072, China;School of Electrical Engineering and Automation, Tianjin University, Tianjin 300072, China;Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Taiwan;School of Electrical Engineering and Automation, Tianjin University, Tianjin 300072, China
Venue:
Pattern Recognition Letters
Year:
2008

Citing 16
Cited 4

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Using Grid-Clustering Methods in Data Classification

PARELEC '02 Proceedings of the International Conference on Parallel Computing in Electrical Engineering
A New Cluster Isolation Criterion Based on Dissimilarity Increments

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Similarity-Based Robust Clustering Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient Disk-Based K-Means Clustering for Relational Databases

IEEE Transactions on Knowledge and Data Engineering
Automated Variable Weighting in k-Means Type Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Automatic Subspace Clustering of High Dimensional Data

Data Mining and Knowledge Discovery
Information cut for clustering using a gradient descent approach

Pattern Recognition
Survey of clustering algorithms

IEEE Transactions on Neural Networks

A new separation measure for improving the effectiveness of validity indices

Information Sciences: an International Journal
Dampster-Shafer evidence theory based multi-characteristics fusion for clustering evaluation

RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
A novel ant-based clustering algorithm using the kernel method

Information Sciences: an International Journal
A novel ant-based clustering algorithm using Renyi entropy

Applied Soft Computing

Quantified Score

Hi-index	0.10

Visualization

Abstract

Hierarchical clustering is an important part of cluster analysis. Based on various theories, numerous hierarchical clustering algorithms have been developed, and new clustering algorithms continue to appear in the literature. It is known that both divisive and agglomerative clustering algorithms in hierarchical clustering play a pivotal role in data-based models, and have been successfully applied in clustering very large datasets. However, hierarchical clustering is parameter-sensitive. When the user has no knowledge of how to choose the input parameters, the clustering results may become undesirable. In this paper, we propose a general grid-clustering approach (GGCA) under a common assumption about hierarchical clustering. The key features of the GGCA include: (1) the combination of the divisible and the agglomerative clustering algorithms into a unifying generative framework; (2) the determination of key input parameters: an optimal grid size for the first time; and (3) the application of a two-phase merging process to aggregate all data objects. Consequently, the GGCA is a non-parametric algorithm which does not require users to input parameters, and exhibits excellent performance in dealing with not well-separated and shape-diverse clusters. Some experimental results comparing the proposed GGCA with the existing methods show the superiority of the GGCA approach.