Constraint Selection by Committee: An Ensemble Approach to Identifying Informative Constraints for Semi-supervised Clustering

Authors:
Derek Greene;Pádraig Cunningham
Affiliations:
University College Dublin, Ireland;University College Dublin, Ireland
Venue:
ECML '07 Proceedings of the 18th European conference on Machine Learning
Year:
2007

Citing 7
Cited 4

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Finding Consistent Clusters in Data Partitions

MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Diverse ensembles for active learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Semi-supervised protein classification using cluster kernels

Bioinformatics

Automated constraint selection for semi-supervised clustering algorithm

CAEPIA'09 Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence
Consensus clustering based on constrained self-organizing map and improved Cop-Kmeans ensemble in intelligent decision support systems

Knowledge-Based Systems
Semi-supervised clustering ensemble based on multi-ant colonies algorithm

RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
Linear semi-supervised projection clustering by transferred centroid regularization

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A number of clustering algorithms have been proposed for use in tasks where a limited degree of supervision is available. This prior knowledge is frequently provided in the form of pairwise must-link and cannot-link constraints. While the incorporation of pairwise supervision has the potential to improve clustering accuracy, the composition and cardinality of the constraint sets can significantly impact upon the level of improvement. We demonstrate that it is often possible to correctly "guess" a large number of constraints without supervision from the co-associations between pairs of objects in an ensemble of clusterings. Along the same lines, we establish that constraints based on pairs with uncertain co-associations are particularly informative, if known. An evaluation on text data shows that this provides an effective criterion for identifying constraints, leading to a reduction in the level of supervision required to direct a clustering algorithm to an accurate solution.