Automated constraint selection for semi-supervised clustering algorithm

  • Authors:
  • Carlos Ruiz;Carlos G. Vallejo;Myra Spiliopoulou;Ernestina Menasalvas

  • Affiliations:
  • Facultad de Informatica, Universidad Politecnica de Madrid, Spain;Department of Computer Languages and Systems, Universidad de Sevilla, Spain;Faculty of Computer Science, Magdeburg University, Germany;Facultad de Informatica, Universidad Politecnica de Madrid, Spain

  • Venue:
  • CAEPIA'09 Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The incorporation of background knowledge in unsupervised algorithms has been shown to yield performance improvements in terms of model quality and execution speed. However, performance is dependent on the quantity and quality of the background knowledge being exploited. In this work, we study the issue of selecting Must-Link and Cannot-Link constraints for semi-supervised clustering. We propose "ConstraintSelector", an algorithm that takes as input a set of labeled data instances, from which constraints can be derived, ranks these instances on their usability and then derives constraints from the topranked instances only. Our experiments show that ConstraintSelector chooses, respectively reduces, the set of candidate constraints without compromising the quality of the derived model.