Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Investigating the relationship between language model perplexity and IR precision-recall measures
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
GaP: a factor model for discrete data
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Applying discrete PCA in data analysis
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
The author-topic model for authors and documents
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Pachinko allocation: DAG-structured mixture models of topic correlations
ICML '06 Proceedings of the 23rd international conference on Machine learning
Mixtures of hierarchical topics with Pachinko allocation
Proceedings of the 24th international conference on Machine learning
Joint latent topic models for text and citations
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation methods for topic models
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part II
Sampling table configurations for the hierarchical poisson-dirichlet process
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Non-Parametric Estimation of Topic Hierarchies from Texts with Hierarchical Dirichlet Processes
The Journal of Machine Learning Research
Hi-index | 0.00 |
Topic models are a discrete analogue to principle component analysis and independent component analysis that model topic at the word level within a document. They have many variants such as NMF, PLSI and LDA, and are used in many fields such as genetics, text and the web, image analysis and recommender systems. However, only recently have reasonable methods for estimating the likelihood of unseen documents, for instance to perform testing or model comparison, become available. This paper explores a number of recent methods, and improves their theory, performance, and testing.