Fast adaptive clustering for very large datasets

Authors:
Kun-Che Lu;Don-Lin Yang;Jungpin Wu
Affiliations:
Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan;Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan;Department of Statistics, Feng Chia University, Taichung, Taiwan
Venue:
ICCOMP'05 Proceedings of the 9th WSEAS International Conference on Computers
Year:
2005

Citing 4
Cited 0

CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose an efficient and effective clustering method that requires to scan a dataset only once. The original dataset is transformed first and merged into a hyper image of controllable size. Unlike traditional methods, the dissimilarity measurement between objects is calculated once for all objects by using various image processing methodologies, such as morphological operations. Image connect component extraction is thereby used to extract clusters from the hyper image. The proposed method is easy to use for clustering data in way of fuzzy and hierarchical fashion readily under a single dataset scan. It is also efficient for incremental and dynamic clustering without additional scan of the original dataset. Experimental results show that the proposed method is robust and stable under various parameter settings such that it is more effective and useful than traditional clustering methods, especially for very large datasets.