Personalized document clustering with dual supervision

Authors:
Yeming Hu;Evangelos E. Milios;James Blustein;Shali Liu
Affiliations:
Dalhousie University, Halifax, NS, Canada;Dalhousie University, Halifax, NS, Canada;Dalhousie University, Halifax, NS, Canada;Dalhousie University, Halifax, NS, Canada
Venue:
Proceedings of the 2012 ACM symposium on Document engineering
Year:
2012

Citing 17
Cited 2

Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Text clustering with extended user feedback

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Enhancing semi-supervised clustering: a feature projection perspective

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning from labeled features using generalized expectation criteria

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Personalized Concept-Based Clustering of Search Engine Queries

IEEE Transactions on Knowledge and Data Engineering
Sentiment analysis of blogs by combining lexical knowledge with text classification

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Text classification by labeling words

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
InterActive feature selection

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Interactive feature selection for document clustering

Proceedings of the 2011 ACM Symposium on Applied Computing
ClusteringWiki: personalized and collaborative clustering of search results

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Helping users sort faster with adaptive machine learning recommendations

INTERACT'11 Proceedings of the 13th IFIP TC 13 international conference on Human-computer interaction - Volume Part III
Semi-supervised document clustering with dual supervision through seeding

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Enhancing semi-supervised document clustering with feature supervision

Proceedings of the 27th Annual ACM Symposium on Applied Computing
A unified framework for document clustering with dual supervision

ACM SIGAPP Applied Computing Review

A unified framework for document clustering with dual supervision

ACM SIGAPP Applied Computing Review
A graph-based topic extraction method enabling simple interactive customization

Proceedings of the 2013 ACM symposium on Document engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The potential for semi-supervised techniques to produce personalized clusters has not been explored. This is due to the fact that semi-supervised clustering algorithms used to be evaluated using oracles based on underlying class labels. Although using oracles allows clustering algorithms to be evaluated quickly and without labor intensive labeling, it has the key disadvantage that oracles always give the same answer for an assignment of a document or a feature. However, different human users might give different assignments of the same document and/or feature because of different but equally valid points of view. In this paper, we conduct a user study in which we ask participants (users) to group the same document collection into clusters according to their own understanding, which are then used to evaluate semi-supervised clustering algorithms for user personalization. Through our user study, we observe that different users have their own personalized organizations of the same collection and a user's organization changes over time. Therefore, we propose that document clustering algorithms should be able to incorporate user input and produce personalized clusters based on the user input. We also confirm that semi-supervised algorithms with noisy user input can still produce better organizations matching user's expectation (personalization) than traditional unsupervised ones. Finally, we demonstrate that labeling keywords for clusters at the same time as labeling documents can improve clustering performance further compared to labeling only documents with respect to user personalization.