The Journal of Machine Learning Research
Topic modeling: beyond bag-of-words
ICML '06 Proceedings of the 23rd international conference on Machine learning
A hierarchical Bayesian language model based on Pitman-Yor processes
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
An n-gram topic model for time-stamped documents
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
An unsupervised topic segmentation model incorporating word order
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
Topic models traditionally rely on the bag-of-words assumption. In data mining applications, this often results in end-users being presented with inscrutable lists of topical unigrams, single words inferred as representative of their topics. In this article, we present a hierarchical generative probabilistic model of topical phrases. The model simultaneously infers the location, length, and topic of phrases within a corpus and relaxes the bag-of-words assumption within phrases by using a hierarchy of Pitman-Yor processes. We use Markov chain Monte Carlo techniques for approximate inference in the model and perform slice sampling to learn its hyperparameters. We show via an experiment on human subjects that our model finds substantially better, more interpretable topical phrases than do competing models.