In search of optimal centroids on data clustering using a binary search algorithm

  • Authors:
  • Abdolreza Hatamlou

  • Affiliations:
  • -

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2012

Quantified Score

Hi-index 0.10

Visualization

Abstract

Data clustering is an important technique in data mining. It is a method of partitioning data into clusters, in which each cluster must have data of great similarity and different clusters must have data of high dissimilarity. A lot of clustering algorithms are found in the literature. In general, there is no single algorithm that is suitable for all types of data, conditions and applications. Each algorithm has its own advantages, limitations and shortcomings. Therefore, introducing novel and effective approaches for data clustering is an open and active research area. This paper presents a novel binary search algorithm for data clustering that not only finds high quality clusters but also converges to the same solution in different runs. In the proposed algorithm a set of initial centroids are chosen from different parts of the test dataset and then optimal locations for the centroids are found by thoroughly exploring around of the initial centroids. The simulation results using six benchmark datasets from the UCI Machine Learning Repository indicate that proposed algorithm can efficiently be used for data clustering.