Semi-supervised Density-Based Clustering

Authors:
Levi Lelis;Jörg Sander
Affiliations:
-;-
Venue:
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Year:
2009

Citing 0
Cited 5

Semi-supervised parameter-free divisive hierarchical clustering of categorical data

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Graph-based clustering with constraints

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Summarization and matching of density-based clusters in streaming environments

Proceedings of the VLDB Endowment
A Simpler and More Accurate AUTO-HDS Framework for Clustering and Visualization of Biological Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Scalable parallel OPTICS data clustering using graph algorithmic techniques

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of the effort in the semi-supervised clustering literature was devoted to variations of the K-means algorithm. In this paper we show how background knowledge can be used to bias a partitional density-based clustering algorithm. Our work describes how labeled objects can be used to help the algorithm detecting suitable density parameters for the algorithm to extract density-based clusters in specific parts of the feature space. Considering the set of constraints estabilished by the labeled dataset we show that our algorithm, called SSDBSCAN, automatically finds density parameters for each natural cluster in a dataset. Four of the most interesting characteristics of SSDBSCAN are that (1) it only requires a single, robust input parameter, (2) it does not need any user intervention, (3) it automaticaly finds the noise objects according to the density of the natural clusters and (4) it is able to find the natural cluster structure even when the density among clusters vary widely. The algorithm presented in this paper is evaluated with artificial and real-world datasets, demonstrating better results when compared to other unsupervised and semi-supervised density-based approaches.