Density-based semi-supervised clustering

Authors:
Carlos Ruiz;Myra Spiliopoulou;Ernestina Menasalvas
Affiliations:
Facultad de Informática, Universidad Politecnica de Madrid, Madrid, Spain;Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany;Facultad de Informática, Universidad Politecnica de Madrid, Madrid, Spain
Venue:
Data Mining and Knowledge Discovery
Year:
2010

Citing 27
Cited 2

The role of domain knowledge in data mining

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Multidimensional binary search trees used for associative searching

Communications of the ACM
Constraint-Based, Multidimensional Data Mining

Computer
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
The Role of Domain Knowledge in a Large Scale Data Mining Project

SETN '02 Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Clustering Spatial Data in the Presence of Obstacles: a Density-Based Approach

IDEAS '02 Proceedings of the 2002 International Symposium on Database Engineering & Applications
Clustering Spatial Data when Facing Physical Constraints

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Uncertainty Handling and Quality Assesment in Data Mining

Uncertainty Handling and Quality Assesment in Data Mining
Intelligent clustering with instance-level constraints

Intelligent clustering with instance-level constraints
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A Framework for Semi-Supervised Learning Based on Subjective and Objective Clustering Criteria

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Efficient incremental constrained clustering

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
C-DBSCAN: Density-Based Clustering with Constraints

RSFDGrC '07 Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
C-DenStream: Using Domain Knowledge on a Data Stream

DS '09 Proceedings of the 12th International Conference on Discovery Science
Measuring constraint-set utility for partitional clustering algorithms

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Agglomerative hierarchical clustering with constraints: theoretical and empirical results

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Graph-based clustering with constraints

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Semi-supervised clustering with discriminative random fields

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semi-supervised clustering methods guide the data partitioning and grouping process by exploiting background knowledge, among else in the form of constraints. In this study, we propose a semi-supervised density-based clustering method. Density-based algorithms are traditionally used in applications, where the anticipated groups are expected to assume non-spherical shapes and/or differ in cardinality or density. Many such applications, among else those on GIS, lend themselves to constraint-based clustering, because there is a priori knowledge on the group membership of some records. In fact, constraints might be the only way to prevent the formation of clusters that do not conform to the applications' semantics. For example, geographical objects, e.g. houses, separated by a borderline or a river may not be assigned to the same cluster, independently of their physical proximity. We first provide an overview of constraint-based clustering for different families of clustering algorithms. Then, we concentrate on the density-based algorithms' family and select the algorithm DBSCAN, which we enhance with Must-Link and Cannot-Link constraints. Our enhancement is seamless: we allow DBSCAN to build temporary clusters, which we then split or merge according to the constraints. Our experiments on synthetic and real datasets show that our approach improves the performance of the algorithm.