The Journal of Machine Learning Research
The author-topic model for authors and documents
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Unsupervised prediction of citation influences
Proceedings of the 24th international conference on Machine learning
Joint sentiment/topic model for sentiment analysis
Proceedings of the 18th ACM conference on Information and knowledge management
Abnormal activity recognition based on HDP-HMM models
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Modeling topic hierarchies with the recursive chinese restaurant process
Proceedings of the 21st ACM international conference on Information and knowledge management
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
We propose a hierarchical nonparametric topic model, based on the hierarchical Dirichlet process (HDP), that accounts for dependencies among the data. The HDP mixture models are useful for discovering an unknown semantic structure (i.e., topics) from a set of unstructured data such as a corpus of documents. For simplicity, HDP makes an exchangeability assumption that any permutation of the data points would result in the same joint probability of the data being generated. This exchangeability assumption poses a problem for some domains where there are clear and strong dependencies among the data. A model that allows for non-exchangeability of data can capture these dependencies and assign higher probabilities to clusters that account for data dependencies, for example, inferring topics that reflect the temporal patterns of the data. Our model incorporates the distance dependent Chinese restaurant process (ddCRP), which clusters data with an inherent bias toward clusters of data points that are near to one another, into a hierarchical construction analogous to the HDP, and we call this new prior the distance dependent Chinese restaurant franchise (ddCRF). When tested with temporal datasets, the ddCRF mixture model shows clear improvements in data fit compared to the HDP in terms of heldout likelihood and complexity. The resulting set of topics shows the sequential emergence and disappearance patterns of topics.