A Data Set Oriented Approach for Clustering Algorithm Selection

  • Authors:
  • Maria Halkidi;Michalis Vazirgiannis

  • Affiliations:
  • -;-

  • Venue:
  • PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the last years the availability of huge transactional and experimental data sets and the arising requirements for data mining created needs for clustering algorithms that scale and can be applied in diverse domains. Thus, a variety of algorithms have been proposed which have application in different fields and may result in different partitioning of a data set, depending on the specific clustering criterion used. Moreover, since clustering is an unsupervised process, most of the algorithms are based on assumptions in order to define a partitioning of a data set. It is then obvious that in most applications the final clustering scheme requires some sort of evaluation. In this paper we present a clustering validity procedure, which taking in account the inherent features of a data set evaluates the results of different clustering algorithms applied to it. A validity index, S_Dbw, is defined according to well-known clustering criteria so as to enable the selection of the algorithm providing the best partitioning of a data set. We evaluate the reliability of our approach both theoretically and experimentally, considering three representative clustering algorithms ran on synthetic and real data sets. It performed favorably in all studies, giving an indication of the algorithm that is suitable for the considered application.