The Journal of Machine Learning Research
Automatic labeling of multinomial topic models
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation methods for topic models
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Creating speech and language data with Amazon's Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Perspectives on crowdsourcing annotations for natural language processing
Language Resources and Evaluation
Hi-index | 0.00 |
Probabilistic topic models are a popular tool for the unsupervised analysis of text, providing both a predictive model of future text and a latent topic representation of the corpus. Recent studies have found that while there are suggestive connections between topic models and the way humans interpret data, these two often disagree. In this paper, we explore this disagreement from the perspective of the learning process rather than the output. We present a novel task, tag-and-cluster, which asks subjects to simultaneously annotate documents and cluster those annotations. We use these annotations as a novel approach for constructing a topic model, grounded in human interpretations of documents. We demonstrate that these topic models have features which distinguish them from traditional topic models.