Semi-supervised clustering: a case study

  • Authors:
  • Andreia Silva;Cláudia Antunes

  • Affiliations:
  • Department of Computer Science and Engineering, Instituto Superior Técnico --- Technical University of Lisbon, Lisbon, Portugal;Department of Computer Science and Engineering, Instituto Superior Técnico --- Technical University of Lisbon, Lisbon, Portugal

  • Venue:
  • MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The exploration of domain knowledge to improve the mining process begins to give its first results. For example, the use of domain-driven constraints allows the focusing of the discovery process on more useful patterns, from the user's point of view. Semi-supervised clustering is a technique that partitions unlabeled data by making use of domain knowledge, usually expressed as pairwise constraints among instances or just as an additional set of labeled instances. This work aims for studying the efficacy of semi-supervised clustering, on the problem of determining if some movie will achieve or not an award, just based on the movies characteristics and on ratings given by spectators. Experimental results show that, in general, semi-supervised clustering achieves better accuracy than unsupervised methods.