Nonlinear Data Analysis Using a New Hybrid Data Clustering Algorithm

  • Authors:
  • Ureerat Wattanachon;Jakkarin Suksawatchon;Chidchanok Lursinsap

  • Affiliations:
  • Department of Computer Science, Faculty of Science, Burapha University, Chonburi, Thailand 20131;Department of Computer Science, Faculty of Science, Burapha University, Chonburi, Thailand 20131;Advanced Virtual and Intelligent Computing (AVIC) Center Department of Mathematics, Chulalongkorn University, Bangkok, Thailand 10330

  • Venue:
  • PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Existing clustering algorithms, such as single-link clustering, k-means, CURE, and CSM are designed to find clusters based on pre-defined parameters specified by users. These algorithms may be unsuccessful if the choice of parameters is inappropriate with respect to the data set being clustered. Most of these algorithms work very well for compact and hyperspherical clusters. In this paper, a new hybrid clustering algorithm called Self-Partition and Self-Merging (SPSM) is proposed. The SPSM algorithm partitions the input data set into several subclusters in the first phase and, then, removes the noisy data in the second phase. In the third phase, the normal subclusters are continuously merged to form the larger clusters based on the inter-cluster distance and intra-cluster distance criteria. From the experimental results, the SPSM algorithm is very efficient to handle the noisy data set, and to cluster the data sets of arbitrary shapes of different density.