Feature diversity in cluster ensembles for robust document clustering

Authors:
Xavier Sevillano;Germán Cobo;Francesc Alías;Joan Claudi Socoró
Affiliations:
Ramon Llull University, Barcelona, Spain;Ramon Llull University, Barcelona, Spain;Ramon Llull University, Barcelona, Spain;Ramon Llull University, Barcelona, Spain
Venue:
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2006

Citing 5
Cited 5

Data clustering: a review

ACM Computing Surveys (CSUR)
Restructuring sparse high dimensional data for effective retrieval

Proceedings of the 1998 conference on Advances in neural information processing systems II
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Relationship-based clustering and cluster ensembles for high-dimensional data mining

Relationship-based clustering and cluster ensembles for high-dimensional data mining
Features for unsupervised document classification

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20

Dual fuzzy-possibilistic coclustering for categorization of documents

IEEE Transactions on Fuzzy Systems
Generalized cluster aggregation

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Improving document clustering using Okapi BM25 feature weighting

Information Retrieval
Cluster ensembles

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Effective measures for inter-document similarity

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of document clustering systems depends on employing optimal text representations, which are not only difficult to determine beforehand, but also may vary from one clustering problem to another. As a first step towards building robust document clusterers, a strategy based on feature diversity and cluster ensembles is presented in this work. Experiments conducted on a binary clustering problem show that our method is robust to near-optimal model order selection and able to detect constructive interactions between different document representations in the test bed.