A cluster-level semi-supervision model for interactive clustering

Authors:
Avinava Dubey;Indrajit Bhattacharya;Shantanu Godbole
Affiliations:
IBM Research - India;Indian Institute of Science;IBM Research - India
Venue:
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Year:
2010

Citing 14
Cited 1

Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Active data clustering

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Non-Redundant Data Clustering

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Scalable Clustering Algorithms with Balancing Constraints

Data Mining and Knowledge Discovery
Interactive visual clustering

Proceedings of the 12th international conference on Intelligent user interfaces
Intractability and clustering with constraints

Proceedings of the 24th international conference on Machine learning
Constrained Clustering: Advances in Algorithms, Theory, and Applications

Constrained Clustering: Advances in Algorithms, Theory, and Applications
Identifying and generating easy sets of constraints for clustering

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Active learning with statistical models

Journal of Artificial Intelligence Research

A new interactive semi-supervised clustering model for large image database indexing

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semi-supervised clustering models, that incorporate user provided constraints to yield meaningful clusters, have recently become a popular area of research. In this paper, we propose a cluster-level semi-supervision model for inter-active clustering. Prototype based clustering algorithms typically alternate between updating cluster descriptions and assignment of data items to clusters. In our model, the user provides semi-supervision directly for these two steps. Assignment feedback re-assigns data items among existing clusters, while cluster description feedback helps to position existing cluster centers more meaningfully. We argue that providing such supervision is more natural for exploratory data mining, where the user discovers and interprets clusters as the algorithm progresses, in comparison to the pair-wise instance level supervision model, particularly for high dimensional data such as document collection. We show how such feedback can be interpreted as constraints and incorporated within the kmeans clustering framework. Using experimental results on multiple real-world datasets, we show that this framework improves clustering performance significantly beyond traditional k-means. Interestingly, when given the same number of feedbacks from the user, the proposed framework significantly outperforms the pair-wise supervision model.