Iterative Double Clustering for Unsupervised and Semi-supervised Learning

Authors:
Ran El-Yaniv;Oren Souroujon
Affiliations:
-;-
Venue:
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Year:
2001

Citing 6
Cited 6

Algorithms for clustering data

Algorithms for clustering data
Elements of information theory

Elements of information theory
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Agnostic classification of Markovian sequences

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
An on-line agglomerative clustering method for nonstationary data

Neural Computation
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval

Distributional word clusters vs. words for text categorization

The Journal of Machine Learning Research
A semi-supervised feature clustering algorithm with application to word sense disambiguation

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Information marginalization on subgraphs

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
A supervised clustering method for text classification

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Constrained co-clustering with non-negative matrix factorisation

International Journal of Business Intelligence and Data Mining
Hierarchical co-clustering based on entropy splitting

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the Iterative Double Clustering (IDC) meta-clustering algorithm, a new extension of the recent Double Clustering (DC) method of Slonim and Tishby that exhibited impressive performance on text categorization tasks [1]. Using synthetically generated data we empirically demonstrate that whenever the DC procedure is successful in recovering some of the structure hidden in the data, the extended IDC procedure can incrementally compute a dramatically better classification, with minor additional computational resources. We demonstrate that the IDC algorithm is especially advantageous when the data exhibits high attribute noise. Our simulation results also show the effectiveness of IDC in text categorization problems. Surprisingly, this unsupervised procedure can be competitive with a (supervised) SVM trained with a small training set. Finally, we propose a natural extension of IDC for (semi-supervised) transductive learning where we are given both labeled and unlabeled examples, and present preliminary empirical results showing the plausibility of the extended method in a semi-supervised setting.