Estimating Likelihoods for Topic Models

Authors:
Wray Buntine
Affiliations:
NICTA and Australian National University, Canberra, Australia 2601
Venue:
ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Year:
2009

Citing 11
Cited 3

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Investigating the relationship between language model perplexity and IR precision-recall measures

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
GaP: a factor model for discrete data

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Applying discrete PCA in data analysis

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
The author-topic model for authors and documents

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Pachinko allocation: DAG-structured mixture models of topic correlations

ICML '06 Proceedings of the 23rd international conference on Machine learning
Mixtures of hierarchical topics with Pachinko allocation

Proceedings of the 24th international conference on Machine learning
Joint latent topic models for text and citations

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation methods for topic models

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Discrete component analysis

SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection

Learning rare behaviours

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part II
Sampling table configurations for the hierarchical poisson-dirichlet process

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Non-Parametric Estimation of Topic Hierarchies from Texts with Hierarchical Dirichlet Processes

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topic models are a discrete analogue to principle component analysis and independent component analysis that model topic at the word level within a document. They have many variants such as NMF, PLSI and LDA, and are used in many fields such as genetics, text and the web, image analysis and recommender systems. However, only recently have reasonable methods for estimating the likelihood of unseen documents, for instance to perform testing or model comparison, become available. This paper explores a number of recent methods, and improves their theory, performance, and testing.