A meta-learning approach for determining the number of clusters with consideration of nearest neighbors

  • Authors:
  • Jong-Seok Lee;Sigurdur Olafsson

  • Affiliations:
  • Department of Systems Management Engineering, Sungkyunkwan University, Suwon 440-746, Republic of Korea;Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA 50011, USA

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.07

Visualization

Abstract

An important and challenging problem in data clustering is the determination of the best number of clusters. A variety of estimation methods has been proposed over the years to address this problem. Most of these methods depend on several nontrivial assumptions about the data structure; and such methods may thus fail to discover the true clusters in a dataset that does not satisfy those assumptions. We develop a new approach that takes as a starting point the simple and intuitive observation that close objects should fall within the same cluster, whereas distant ones should not. Based on this simple notion we utilize a new measurement of good clustering called disconnectivity as well as existing goodness measurements; and we embed these measures into a meta-learning approach for estimating the number of clusters. A simulation experiment based on 13 representative models and an application to real world datasets are conducted to show the effectiveness of the proposed method.