Selecting typical instances in instance-based learning
ML92 Proceedings of the ninth international workshop on Machine learning
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Semi-supervised Clustering by Seeding
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering with Instance-level Constraints
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Integrating constraints and metric learning in semi-supervised clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
A Framework for Semi-Supervised Learning Based on Subjective and Objective Clustering Criteria
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Efficient incremental constrained clustering
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Weighted Instance Typicality Search (WITS): A nearest neighbor data reduction algorithm
Intelligent Data Analysis
ECML '07 Proceedings of the 18th European conference on Machine Learning
C-DBSCAN: Density-Based Clustering with Constraints
RSFDGrC '07 Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Improved heterogeneous distance functions
Journal of Artificial Intelligence Research
Measuring constraint-set utility for partitional clustering algorithms
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
A general approach for adaptive kernels in semi-supervised clustering
IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Hi-index | 0.00 |
The incorporation of background knowledge in unsupervised algorithms has been shown to yield performance improvements in terms of model quality and execution speed. However, performance is dependent on the quantity and quality of the background knowledge being exploited. In this work, we study the issue of selecting Must-Link and Cannot-Link constraints for semi-supervised clustering. We propose "ConstraintSelector", an algorithm that takes as input a set of labeled data instances, from which constraints can be derived, ranks these instances on their usability and then derives constraints from the topranked instances only. Our experiments show that ConstraintSelector chooses, respectively reduces, the set of candidate constraints without compromising the quality of the derived model.