Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The Journal of Machine Learning Research
Probabilistic author-topic models for information discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
The author-topic model for authors and documents
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
ICML '06 Proceedings of the 23rd international conference on Machine learning
Statistical entity-topic models
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A hierarchical Bayesian language model based on Pitman-Yor processes
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Probabilistic latent semantic visualization: topic model for visualizing documents
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Joint latent topic models for text and citations
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Accounting for burstiness in topic models
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Practical collapsed variational bayes inference for hierarchical dirichlet process
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Sweeping through the topic space: bad luck? Roll again!
ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
A fast indexing algorithm optimization with user behavior pattern
ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
Genre-Based Music Language Modeling with Latent Hierarchical Pitman-Yor Process Allocation
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.00 |
One important approach for knowledge discovery and data mining is to estimate unobserved variables because latent variables can indicate hidden specific properties of observed data. The latent factor model assumes that each item in a record has a latent factor; the co-occurrence of items can then be modeled by latent factors. In document modeling, a record indicates a document represented as a "bag of words," meaning that the order of words is ignored, an item indicates a word and a latent factor indicates a topic. Latent Dirichlet allocation (LDA) is a widely used Bayesian topic model applying the Dirichlet distribution over the latent topic distribution of a document having multiple topics. LDA assumes that latent topics, i.e., discrete latent variables, are distributed according to a multinomial distribution whose parameters are generated from the Dirichlet distribution. LDA also models a word distribution by using a multinomial distribution whose parameters follows the Dirichlet distribution. This Dirichlet-multinomial setting, however, cannot capture the power-law phenomenon of a word distribution, which is known as Zipf's law in linguistics. We therefore propose a novel topic model using the Pitman-Yor(PY) process, called the PY topic model. The PY topic model captures two properties of a document; a power-law word distribution and the presence of multiple topics. In an experiment using real data, this model outperformed LDA in document modeling in terms of perplexity.