Bayesian checking for topic models

Authors:
David Mimno;David Blei
Affiliations:
Princeton University Princeton, NJ;Princeton University Princeton, NJ
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 10
Cited 1

Latent dirichlet allocation

The Journal of Machine Learning Research
Dynamic topic models

ICML '06 Proceedings of the 23rd international conference on Machine learning
A mixture model for contextual text mining

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Organizing the OCA: learning faceted subjects from a library of digital books

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
A Joint Topic and Perspective Model for Ideological Discourse

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Accounting for burstiness in topic models

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Evaluation methods for topic models

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Automatic evaluation of topic coherence

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Summarizing contrastive viewpoints in opinionated text

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Staying informed: supervised and semi-supervised multi-view topical analysis of ideological perspective

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Temporal contexts: Effective text classification in evolving document collections

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real document collections do not fit the independence assumptions asserted by most statistical topic models, but how badly do they violate them? We present a Bayesian method for measuring how well a topic model fits a corpus. Our approach is based on posterior predictive checking, a method for diagnosing Bayesian models in user-defined ways. Our method can identify where a topic model fits the data, where it falls short, and in which directions it might be improved.