On smoothing and inference for topic models

Authors:
Arthur Asuncion;Max Welling;Padhraic Smyth;Yee Whye Teh
Affiliations:
University of California, Irvine, CA;University of California, Irvine, CA;University of California, Irvine, CA;University College London, London, UK
Venue:
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Year:
2009

Citing 11
Cited 41

A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Testing the correlation of word error rate and perplexity

Speech Communication
On an equivalence between PLSI and LDA

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Bayesian Latent Semantic Analysis of Multimedia Databases

Bayesian Latent Semantic Analysis of Multimedia Databases
Latent dirichlet allocation

The Journal of Machine Learning Research
Convergence and asymptotic normality of variational Bayesian approximations for exponential family models with missing values

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

The Journal of Machine Learning Research
A Unified View of Matrix Factorization Models

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Discrete component analysis

SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
Adaptive Bayesian Latent Semantic Analysis

IEEE Transactions on Audio, Speech, and Language Processing

Software traceability with topic modeling

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
An efficient block model for clustering sparse graphs

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Online multiscale dynamic topic models

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Translingual document representations from discriminative projections

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Learning summary content units with topic modeling

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Modeling and discovering occupancy patterns in sensor networks using latent dirichlet allocation

IWINAC'11 Proceedings of the 4th international conference on Interplay between natural and artificial computation - Volume Part I
Clickthrough-based latent semantic models for web search

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Ranking related news predictions

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Steering time-dependent estimation of posteriors with hyperparameter indexing in bayesian topic models

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Learning discriminative projections for text similarity measures

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Partially labeled topic models for interpretable text mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Conditional topical coding: an efficient topic model conditioned on rich features

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting insights from social media with large-scale matrix approximations

IBM Journal of Research and Development
Online conversation mining for author characterization and topic identification

Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling

ACM Transactions on Intelligent Systems and Technology (TIST)
A probabilistic imputation framework for predictive analysis using variably aggregated, multi-source healthcare data

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Probabilistic topic models

Communications of the ACM
Modeling topical trends over continuous time with priors

ISNN'10 Proceedings of the 7th international conference on Advances in Neural Networks - Volume Part II
Incorporating Sentiment Prior Knowledge for Weakly Supervised Sentiment Analysis

ACM Transactions on Asian Language Information Processing (TALIP)
Collective context-aware topic models for entity disambiguation

Proceedings of the 21st international conference on World Wide Web
Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce

Proceedings of the 21st international conference on World Wide Web
Practical collapsed variational bayes inference for hierarchical dirichlet process

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A joint model for discovery of aspects in utterances

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Reducing wrong labels in distant supervision for relation extraction

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Robust PLSA performs better than LDA

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Automatic tag recommendation for metadata annotation using probabilistic topic modeling

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Modeling click-through based word-pairs for web search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Boosting novelty for biomedical information retrieval through probabilistic latent semantic analysis

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A biterm topic model for short texts

Proceedings of the 22nd international conference on World Wide Web
Sparse online topic models

Proceedings of the 22nd international conference on World Wide Web
Variational inference in nonconjugate models

The Journal of Machine Learning Research
Stochastic variational inference

The Journal of Machine Learning Research
A topic modeling toolbox using belief propagation

The Journal of Machine Learning Research
Discovering health-related knowledge in social media using ensembles of heterogeneous features

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
CUDIA: Probabilistic cross-level imputation using individual auxiliary information

ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
A revised inference for correlated topic model

ISNN'13 Proceedings of the 10th international conference on Advances in Neural Networks - Volume Part II
Joint and coupled bilingual topic model based sentence representations for language model adaptation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Tag-weighted topic model for mining semi-structured documents

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
A Graph Analytical Approach for Topic Detection

ACM Transactions on Internet Technology (TOIT)
The dual-sparse topic model: mining focused topics and focused terms in short text

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.02

Visualization

Abstract

Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical comparisons. In this paper, we highlight the close connections between these approaches. We find that the main differences are attributable to the amount of smoothing applied to the counts. When the hyperparameters are optimized, the differences in performance among the algorithms diminish significantly. The ability of these algorithms to achieve solutions of comparable accuracy gives us the freedom to select computationally efficient approaches. Using the insights gained from this comparative study, we show how accurate topic models can be learned in several seconds on text corpora with thousands of documents.