Semi-supervised clustering: a case study

Authors:
Andreia Silva;Cláudia Antunes
Affiliations:
Department of Computer Science and Engineering, Instituto Superior Técnico --- Technical University of Lisbon, Lisbon, Portugal;Department of Computer Science and Engineering, Instituto Superior Técnico --- Technical University of Lisbon, Lisbon, Portugal
Venue:
MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2012

Citing 8
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Sampling-based sequential subgroup mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Introduction to Information Retrieval

Introduction to Information Retrieval
Constrained Clustering: Advances in Algorithms, Theory, and Applications

Constrained Clustering: Advances in Algorithms, Theory, and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The exploration of domain knowledge to improve the mining process begins to give its first results. For example, the use of domain-driven constraints allows the focusing of the discovery process on more useful patterns, from the user's point of view. Semi-supervised clustering is a technique that partitions unlabeled data by making use of domain knowledge, usually expressed as pairwise constraints among instances or just as an additional set of labeled instances. This work aims for studying the efficacy of semi-supervised clustering, on the problem of determining if some movie will achieve or not an award, just based on the movies characteristics and on ratings given by spectators. Experimental results show that, in general, semi-supervised clustering achieves better accuracy than unsupervised methods.