Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A probabilistic framework for semi-supervised clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Text clustering with extended user feedback
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Enhancing semi-supervised clustering: a feature projection perspective
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning from labeled features using generalized expectation criteria
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Personalized Concept-Based Clustering of Search Engine Queries
IEEE Transactions on Knowledge and Data Engineering
Sentiment analysis of blogs by combining lexical knowledge with text classification
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Text classification by labeling words
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Interactive feature selection for document clustering
Proceedings of the 2011 ACM Symposium on Applied Computing
ClusteringWiki: personalized and collaborative clustering of search results
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Helping users sort faster with adaptive machine learning recommendations
INTERACT'11 Proceedings of the 13th IFIP TC 13 international conference on Human-computer interaction - Volume Part III
Semi-supervised document clustering with dual supervision through seeding
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Enhancing semi-supervised document clustering with feature supervision
Proceedings of the 27th Annual ACM Symposium on Applied Computing
A unified framework for document clustering with dual supervision
ACM SIGAPP Applied Computing Review
A unified framework for document clustering with dual supervision
ACM SIGAPP Applied Computing Review
A graph-based topic extraction method enabling simple interactive customization
Proceedings of the 2013 ACM symposium on Document engineering
Hi-index | 0.00 |
The potential for semi-supervised techniques to produce personalized clusters has not been explored. This is due to the fact that semi-supervised clustering algorithms used to be evaluated using oracles based on underlying class labels. Although using oracles allows clustering algorithms to be evaluated quickly and without labor intensive labeling, it has the key disadvantage that oracles always give the same answer for an assignment of a document or a feature. However, different human users might give different assignments of the same document and/or feature because of different but equally valid points of view. In this paper, we conduct a user study in which we ask participants (users) to group the same document collection into clusters according to their own understanding, which are then used to evaluate semi-supervised clustering algorithms for user personalization. Through our user study, we observe that different users have their own personalized organizations of the same collection and a user's organization changes over time. Therefore, we propose that document clustering algorithms should be able to incorporate user input and produce personalized clusters based on the user input. We also confirm that semi-supervised algorithms with noisy user input can still produce better organizations matching user's expectation (personalization) than traditional unsupervised ones. Finally, we demonstrate that labeling keywords for clusters at the same time as labeling documents can improve clustering performance further compared to labeling only documents with respect to user personalization.