Incorporating lexical priors into topic models

Authors:
Jagadeesh Jagarlamudi;Hal Daumé, III;Raghavendra Udupa
Affiliations:
University of Maryland, College Park;University of Maryland, College Park;Microsoft Research Bangalore, India
Venue:
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Year:
2012

Citing 14
Cited 3

Machine Learning

Machine Learning
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
A bootstrapping method for learning semantic lexicons using extraction pattern contexts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Topic modeling: beyond bag-of-words

ICML '06 Proceedings of the 23rd international conference on Machine learning
Prototype-driven learning for sequence models

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Comparing clusterings---an information based distance

Journal of Multivariate Analysis
Constrained Clustering: Advances in Algorithms, Theory, and Applications

Constrained Clustering: Advances in Algorithms, Theory, and Applications
Incorporating domain knowledge into topic modeling via Dirichlet Forest priors

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Latent Dirichlet Allocation with topic-in-set knowledge

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Interactive topic modeling

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Incorporating word correlation into tag-topic model for semantic knowledge acquisition

Proceedings of the 21st ACM international conference on Information and knowledge management
Discovering coherent topics using general knowledge

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Leveraging multi-domain prior knowledge in topic models

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topic models have great potential for helping users understand document corpora. This potential is stymied by their purely unsupervised nature, which often leads to topics that are neither entirely meaningful nor effective in extrinsic tasks (Chang et al., 2009). We propose a simple and effective way to guide topic models to learn topics of specific interest to a user. We achieve this by providing sets of seed words that a user believes are representative of the underlying topics in a corpus. Our model uses these seeds to improve both topic-word distributions (by biasing topics to produce appropriate seed words) and to improve document-topic distributions (by biasing documents to select topics related to the seed words they contain). Extrinsic evaluation on a document clustering task reveals a significant improvement when using seed information, even over other models that use seed information naïvely.