Density-based semi-supervised clustering

  • Authors:
  • Carlos Ruiz;Myra Spiliopoulou;Ernestina Menasalvas

  • Affiliations:
  • Facultad de Informática, Universidad Politecnica de Madrid, Madrid, Spain;Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany;Facultad de Informática, Universidad Politecnica de Madrid, Madrid, Spain

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semi-supervised clustering methods guide the data partitioning and grouping process by exploiting background knowledge, among else in the form of constraints. In this study, we propose a semi-supervised density-based clustering method. Density-based algorithms are traditionally used in applications, where the anticipated groups are expected to assume non-spherical shapes and/or differ in cardinality or density. Many such applications, among else those on GIS, lend themselves to constraint-based clustering, because there is a priori knowledge on the group membership of some records. In fact, constraints might be the only way to prevent the formation of clusters that do not conform to the applications' semantics. For example, geographical objects, e.g. houses, separated by a borderline or a river may not be assigned to the same cluster, independently of their physical proximity. We first provide an overview of constraint-based clustering for different families of clustering algorithms. Then, we concentrate on the density-based algorithms' family and select the algorithm DBSCAN, which we enhance with Must-Link and Cannot-Link constraints. Our enhancement is seamless: we allow DBSCAN to build temporary clusters, which we then split or merge according to the constraints. Our experiments on synthetic and real datasets show that our approach improves the performance of the algorithm.