An experimental study of constrained clustering effectiveness in presence of erroneous constraints

Authors:
M. Eduardo Ares;Javier Parapar;Álvaro Barreiro
Affiliations:
IRLab, Department of Computer Science, University of A Coruña, Campus de Elviña, 15071 A Coruña, Spain;IRLab, Department of Computer Science, University of A Coruña, Campus de Elviña, 15071 A Coruña, Spain;IRLab, Department of Computer Science, University of A Coruña, Campus de Elviña, 15071 A Coruña, Spain
Venue:
Information Processing and Management: an International Journal
Year:
2012

Citing 23
Cited 0

Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Data clustering: a review

ACM Computing Surveys (CSUR)
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Document clustering with committees

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Non-Redundant Data Clustering

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Learning with Constrained and Unlabelled Data

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Document clustering with prior knowledge

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Near-duplicate detection by instance-level constrained clustering

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Revisiting probabilistic models for clustering with pair-wise constraints

Proceedings of the 24th international conference on Machine learning
Spectral clustering with inconsistent advice

Proceedings of the 25th international conference on Machine learning
Constrained Clustering: Advances in Algorithms, Theory, and Applications

Constrained Clustering: Advances in Algorithms, Theory, and Applications
Non-redundant Multi-view Clustering via Orthogonalization

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Finding Alternative Clusterings Using Constraints

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Evaluation of Text Clustering Algorithms with N-Gram-Based Document Fingerprints

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Training Data Cleaning for Text Classification

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Avoiding Bias in Text Clustering Using Constrained K-means and May-Not-Links

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Measuring constraint-set utility for partitional clustering algorithms

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently a new fashion of semi-supervised clustering algorithms, coined as constrained clustering, has emerged. These new algorithms can incorporate some a priori domain knowledge to the clustering process, allowing the user to guide the method. The vast majority of studies about the effectiveness of these approaches have been performed using information, in the form of constraints, which was totally accurate. This would be the ideal case, but such a situation will be impossible in most realistic settings, due to errors in the constraint creation process, misjudgements of the user, inconsistent information, etc. Hence, the robustness of the constrained clustering algorithms when dealing with erroneous constraints is bound to play an important role in their final effectiveness. In this paper we study the behaviour of four constrained clustering algorithms (Constrained k-Means, Soft Constrained k-Means, Constrained Normalised Cut and Normalised Cut with Imposed Constraints) when not all the information supplied to them is accurate. The experimentation over text and numeric datasets using two different noise models, one of them an original approach based on similarities, highlighted the strengths and weaknesses of each method when working with positive and negative constraints, indicating the scenarios in which each algorithm is more appropriate.