Nonlinear Data Analysis Using a New Hybrid Data Clustering Algorithm

Authors:
Ureerat Wattanachon;Jakkarin Suksawatchon;Chidchanok Lursinsap
Affiliations:
Department of Computer Science, Faculty of Science, Burapha University, Chonburi, Thailand 20131;Department of Computer Science, Faculty of Science, Burapha University, Chonburi, Thailand 20131;Advanced Virtual and Intelligent Computing (AVIC) Center Department of Mathematics, Chulalongkorn University, Bangkok, Thailand 10330
Venue:
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Year:
2009

Citing 8
Cited 0

Algorithms for clustering data

Algorithms for clustering data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data clustering: a review

ACM Computing Surveys (CSUR)
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Combining Partitional and Hierarchical Algorithms for Robust and Efficient Data Clustering with Cohesion Self-Merging

IEEE Transactions on Knowledge and Data Engineering
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing clustering algorithms, such as single-link clustering, k-means, CURE, and CSM are designed to find clusters based on pre-defined parameters specified by users. These algorithms may be unsuccessful if the choice of parameters is inappropriate with respect to the data set being clustered. Most of these algorithms work very well for compact and hyperspherical clusters. In this paper, a new hybrid clustering algorithm called Self-Partition and Self-Merging (SPSM) is proposed. The SPSM algorithm partitions the input data set into several subclusters in the first phase and, then, removes the noisy data in the second phase. In the third phase, the normal subclusters are continuously merged to form the larger clusters based on the inter-cluster distance and intra-cluster distance criteria. From the experimental results, the SPSM algorithm is very efficient to handle the noisy data set, and to cluster the data sets of arbitrary shapes of different density.