Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization
ACM Transactions on Mathematical Software (TOMS)
The Journal of Machine Learning Research
A Bayesian Hierarchical Model for Learning Natural Scene Categories
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Modeling word burstiness using the Dirichlet distribution
ICML '05 Proceedings of the 22nd international conference on Machine learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Pachinko allocation: DAG-structured mixture models of topic correlations
ICML '06 Proceedings of the 23rd international conference on Machine learning
Modeling the evolution of associated data
Data & Knowledge Engineering
Topic models with power-law using Pitman-Yor process
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Unified analysis of streaming news
Proceedings of the 20th international conference on World wide web
A time-dependent topic model for multiple text streams
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Communications of the ACM
Bayesian checking for topic models
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Optimizing semantic coherence in topic models
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
The generalized dirichlet distribution in enhanced topic detection
Proceedings of the 21st ACM international conference on Information and knowledge management
A partially supervised cross-collection topic model for cross-domain text classification
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Probabilistic topic models for sequence data
Machine Learning
Latent word context model for information retrieval
Information Retrieval
Hi-index | 0.02 |
Many different topic models have been used successfully for a variety of applications. However, even state-of-the-art topic models suffer from the important flaw that they do not capture the tendency of words to appear in bursts; it is a fundamental property of language that if a word is used once in a document, it is more likely to be used again. We introduce a topic model that uses Dirichlet compound multinomial (DCM) distributions to model this burstiness phenomenon. On both text and non-text datasets, the new model achieves better held-out likelihood than standard latent Dirichlet allocation (LDA). It is straightforward to incorporate the DCM extension into topic models that are more complex than LDA.