Interactive feature selection for document clustering

Authors:
Yeming Hu;Evangelos E. Milios;James Blustein
Affiliations:
Dalhousie University, Halifax, Canada;Dalhousie University, Halifax, Canada;Dalhousie University
Venue:
Proceedings of the 2011 ACM Symposium on Applied Computing
Year:
2011

Citing 10
Cited 6

Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Supervised term weighting for automated text categorization

Proceedings of the 2003 ACM symposium on Applied computing
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A Semi-Supervised Document Clustering Algorithm Based on EM

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Document clustering with prior knowledge

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Enhancing semi-supervised clustering: a feature projection perspective

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Constrained locally weighted clustering

Proceedings of the VLDB Endowment
An active learning framework for semi-supervised document clustering with language modeling

Data & Knowledge Engineering
Text classification by labeling words

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
InterActive feature selection

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Semi-supervised document clustering with dual supervision through seeding

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Enhancing semi-supervised document clustering with feature supervision

Proceedings of the 27th Annual ACM Symposium on Applied Computing
A unified framework for document clustering with dual supervision

ACM SIGAPP Applied Computing Review
Personalized document clustering with dual supervision

Proceedings of the 2012 ACM symposium on Document engineering
Improving consensus clustering of texts using interactive feature selection

Proceedings of the 22nd international conference on World Wide Web companion
Interactive text document clustering using feature labeling

Proceedings of the 2013 ACM symposium on Document engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional document clustering techniques group similar documents without any user interaction. Although such methods minimize user effort, the clusters they generate are often not in accord with their users' conception of the document collection. In this paper we describe a new framework and experiments with it exploring how clustering might be improved by including user supervision at the level of selecting features that are used to distinguish between documents. Our features are based on the words that appear in documents (see §4.1 for details.) We conjecture that clusters better matching user expectations can be generated with user input at the feature level. In order to verify our conjecture, we propose a novel iterative framework which involves users interactively selecting the features used to cluster documents. Unlike existing semi-supervised clustering, which asks users to label constraints between documents, this framework interactively asks users to label features. The proposed method ranks all features based on the recent clusters using cluster-based feature selection and presents a list of highly ranked features to users for labeling. The feature set for next clustering iteration includes both features accepted by users and other highly ranked features. The experimental results on several real datasets demonstrate that the feature set obtained using the new interactive framework can produce clusters that better match the user's expectations. Moreover, we quantify and evaluate the effect of reweighting previously accepted features and of user effort.