Avoiding Bias in Text Clustering Using Constrained K-means and May-Not-Links

Authors:
M. Eduardo Ares;Javier Parapar;Álvaro Barreiro
Affiliations:
IRLab, Department of Computer Science, University of A Coruña, Spain;IRLab, Department of Computer Science, University of A Coruña, Spain;IRLab, Department of Computer Science, University of A Coruña, Spain
Venue:
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Year:
2009

Citing 14
Cited 3

Data clustering: a review

ACM Computing Surveys (CSUR)
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Document clustering with committees

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Non-Redundant Data Clustering

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Document clustering with prior knowledge

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Near-duplicate detection by instance-level constrained clustering

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Introduction to Information Retrieval

Introduction to Information Retrieval
Constrained Clustering: Advances in Algorithms, Theory, and Applications

Constrained Clustering: Advances in Algorithms, Theory, and Applications
Non-redundant Multi-view Clustering via Orthogonalization

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Finding Alternative Clusterings Using Constraints

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining

Improving alternative text clustering quality in the avoiding bias task with spectral and flat partition algorithms

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
An experimental study of constrained clustering effectiveness in presence of erroneous constraints

Information Processing and Management: an International Journal
Language modelling of constraints for text clustering

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a new clustering algorithm which extends the traditional batch k-means enabling the introduction of domain knowledge in the form of Must, Cannot, May and May-Not rules between the data points. Besides, we have applied the presented method to the task of avoiding bias in clustering. Evaluation carried out in standard collections showed considerable improvements in effectiveness against previous constrained and non-constrained algorithms for the given task.