Automated constraint selection for semi-supervised clustering algorithm

Authors:
Carlos Ruiz;Carlos G. Vallejo;Myra Spiliopoulou;Ernestina Menasalvas
Affiliations:
Facultad de Informatica, Universidad Politecnica de Madrid, Spain;Department of Computer Languages and Systems, Universidad de Sevilla, Spain;Faculty of Computer Science, Magdeburg University, Germany;Facultad de Informatica, Universidad Politecnica de Madrid, Spain
Venue:
CAEPIA'09 Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence
Year:
2009

Citing 12
Cited 1

Selecting typical instances in instance-based learning

ML92 Proceedings of the ninth international workshop on Machine learning
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A Framework for Semi-Supervised Learning Based on Subjective and Objective Clustering Criteria

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Efficient incremental constrained clustering

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Weighted Instance Typicality Search (WITS): A nearest neighbor data reduction algorithm

Intelligent Data Analysis
Constraint Selection by Committee: An Ensemble Approach to Identifying Informative Constraints for Semi-supervised Clustering

ECML '07 Proceedings of the 18th European conference on Machine Learning
C-DBSCAN: Density-Based Clustering with Constraints

RSFDGrC '07 Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
Measuring constraint-set utility for partitional clustering algorithms

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

A general approach for adaptive kernels in semi-supervised clustering

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The incorporation of background knowledge in unsupervised algorithms has been shown to yield performance improvements in terms of model quality and execution speed. However, performance is dependent on the quantity and quality of the background knowledge being exploited. In this work, we study the issue of selecting Must-Link and Cannot-Link constraints for semi-supervised clustering. We propose "ConstraintSelector", an algorithm that takes as input a set of labeled data instances, from which constraints can be derived, ranks these instances on their usability and then derives constraints from the topranked instances only. Our experiments show that ConstraintSelector chooses, respectively reduces, the set of candidate constraints without compromising the quality of the derived model.