A view of the EM algorithm that justifies incremental, sparse, and other variants
Learning in graphical models
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
Testing the correlation of word error rate and perplexity
Speech Communication
On an equivalence between PLSI and LDA
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Bayesian Latent Semantic Analysis of Multimedia Databases
Bayesian Latent Semantic Analysis of Multimedia Databases
The Journal of Machine Learning Research
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting
The Journal of Machine Learning Research
A Unified View of Matrix Factorization Models
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
Adaptive Bayesian Latent Semantic Analysis
IEEE Transactions on Audio, Speech, and Language Processing
Software traceability with topic modeling
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
An efficient block model for clustering sparse graphs
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Online multiscale dynamic topic models
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Translingual document representations from discriminative projections
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Learning summary content units with topic modeling
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Modeling and discovering occupancy patterns in sensor networks using latent dirichlet allocation
IWINAC'11 Proceedings of the 4th international conference on Interplay between natural and artificial computation - Volume Part I
Clickthrough-based latent semantic models for web search
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Ranking related news predictions
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Learning discriminative projections for text similarity measures
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Partially labeled topic models for interpretable text mining
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Conditional topical coding: an efficient topic model conditioned on rich features
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting insights from social media with large-scale matrix approximations
IBM Journal of Research and Development
Online conversation mining for author characterization and topic identification
Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling
ACM Transactions on Intelligent Systems and Technology (TIST)
Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Communications of the ACM
Modeling topical trends over continuous time with priors
ISNN'10 Proceedings of the 7th international conference on Advances in Neural Networks - Volume Part II
Incorporating Sentiment Prior Knowledge for Weakly Supervised Sentiment Analysis
ACM Transactions on Asian Language Information Processing (TALIP)
Collective context-aware topic models for entity disambiguation
Proceedings of the 21st international conference on World Wide Web
Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce
Proceedings of the 21st international conference on World Wide Web
Practical collapsed variational bayes inference for hierarchical dirichlet process
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A joint model for discovery of aspects in utterances
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Reducing wrong labels in distant supervision for relation extraction
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Robust PLSA performs better than LDA
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Automatic tag recommendation for metadata annotation using probabilistic topic modeling
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Modeling click-through based word-pairs for web search
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Boosting novelty for biomedical information retrieval through probabilistic latent semantic analysis
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A biterm topic model for short texts
Proceedings of the 22nd international conference on World Wide Web
Proceedings of the 22nd international conference on World Wide Web
Variational inference in nonconjugate models
The Journal of Machine Learning Research
Stochastic variational inference
The Journal of Machine Learning Research
A topic modeling toolbox using belief propagation
The Journal of Machine Learning Research
Discovering health-related knowledge in social media using ensembles of heterogeneous features
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
CUDIA: Probabilistic cross-level imputation using individual auxiliary information
ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
A revised inference for correlated topic model
ISNN'13 Proceedings of the 10th international conference on Advances in Neural Networks - Volume Part II
Joint and coupled bilingual topic model based sentence representations for language model adaptation
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Tag-weighted topic model for mining semi-structured documents
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
A Graph Analytical Approach for Topic Detection
ACM Transactions on Internet Technology (TOIT)
The dual-sparse topic model: mining focused topics and focused terms in short text
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.02 |
Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical comparisons. In this paper, we highlight the close connections between these approaches. We find that the main differences are attributable to the amount of smoothing applied to the counts. When the hyperparameters are optimized, the differences in performance among the algorithms diminish significantly. The ability of these algorithms to achieve solutions of comparable accuracy gives us the freedom to select computationally efficient approaches. Using the insights gained from this comparative study, we show how accurate topic models can be learned in several seconds on text corpora with thousands of documents.