Optimal bandwidth selection for density-based clustering

  • Authors:
  • Hong Jin;Shuliang Wang;Qian Zhou;Ying Li

  • Affiliations:
  • State Key Laboratory of Software Engineering, Wuhan University, Wuhan, China;State Key Laboratory of Software Engineering and International School of Software, Wuhan University, Wuhan, China;International School of Software, Wuhan University, Wuhan, China;School of Mathematics and Statistics, Wuhan University, Wuhan, China

  • Venue:
  • DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cluster analysis has long played an important role in a wide variety of data applications. When the clusters are irregular or intertwined, densitybased clustering is proved to be much more efficient. The quality of clustering result depends on an adequate choice of the parameters. However, without enough domain knowledge the parameter setting is somewhat limited in its operability. In this paper, a new method is proposed to automatically find out the optimal parameter value of the bandwidth. It is to infer the most suitable parameter value by the constructed model on parameter estimation. Based on the Bayesian Theorem, from which the most probability value for the bandwidth can be acquired in accordance with the inherent distribution characteristics of the original data set. Clusters can then be identified by the determined parameter values. The results of the experiment show that the proposed method has complementary advantages in the density-based clustering algorithm.