Semi-supervised Density-Based Clustering

  • Authors:
  • Levi Lelis;Jörg Sander

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most of the effort in the semi-supervised clustering literature was devoted to variations of the K-means algorithm. In this paper we show how background knowledge can be used to bias a partitional density-based clustering algorithm. Our work describes how labeled objects can be used to help the algorithm detecting suitable density parameters for the algorithm to extract density-based clusters in specific parts of the feature space. Considering the set of constraints estabilished by the labeled dataset we show that our algorithm, called SSDBSCAN, automatically finds density parameters for each natural cluster in a dataset. Four of the most interesting characteristics of SSDBSCAN are that (1) it only requires a single, robust input parameter, (2) it does not need any user intervention, (3) it automaticaly finds the noise objects according to the density of the natural clusters and (4) it is able to find the natural cluster structure even when the density among clusters vary widely. The algorithm presented in this paper is evaluated with artificial and real-world datasets, demonstrating better results when compared to other unsupervised and semi-supervised density-based approaches.