Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation

Authors:
James Foulds;Levi Boyles;Christopher DuBois;Padhraic Smyth;Max Welling
Affiliations:
University of California, Irvine, Irvine, CA, USA;University of California, Irvine, Irvine, CA, USA;University of California, Irvine, Irvine, CA, USA;University of California, Irvine, Irvine, CA, USA;University of Amsterdam, Amsterdam, Netherlands
Venue:
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2013

Citing 7
Cited 0

Latent dirichlet allocation

The Journal of Machine Learning Research
Fast collapsed gibbs sampling for latent dirichlet allocation

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient methods for topic model inference on streaming document collections

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed Algorithms for Topic Models

The Journal of Machine Learning Research
On smoothing and inference for topic models

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
An architecture for parallel topic models

Proceedings of the VLDB Endowment
Computational historiography: Data mining in a century of classics journals

Journal on Computing and Cultural Heritage (JOCCH)

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been an explosion in the amount of digital text information available in recent years, leading to challenges of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it feasible to learn topic models on very large-scale corpora, but these methods do not currently take full advantage of the collapsed representation of the model. We propose a stochastic algorithm for collapsed variational Bayesian inference for LDA, which is simpler and more efficient than the state of the art method. In experiments on large-scale text corpora, the algorithm was found to converge faster and often to a better solution than previous methods. Human-subject experiments also demonstrated that the method can learn coherent topics in seconds on small corpora, facilitating the use of topic models in interactive document analysis software.