Avoiding Bias in Text Clustering Using Constrained K-means and May-Not-Links

  • Authors:
  • M. Eduardo Ares;Javier Parapar;Álvaro Barreiro

  • Affiliations:
  • IRLab, Department of Computer Science, University of A Coruña, Spain;IRLab, Department of Computer Science, University of A Coruña, Spain;IRLab, Department of Computer Science, University of A Coruña, Spain

  • Venue:
  • ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a new clustering algorithm which extends the traditional batch k-means enabling the introduction of domain knowledge in the form of Must, Cannot, May and May-Not rules between the data points. Besides, we have applied the presented method to the task of avoiding bias in clustering. Evaluation carried out in standard collections showed considerable improvements in effectiveness against previous constrained and non-constrained algorithms for the given task.