Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval
Proceedings of the tenth international conference on Information and knowledge management
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
On an equivalence between PLSI and LDA
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
Probabilistic author-topic models for information discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A cross-collection mixture model for comparative text mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Relation between PLSA and NMF and implications
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Pachinko allocation: DAG-structured mixture models of topic correlations
ICML '06 Proceedings of the 23rd international conference on Machine learning
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Topics over time: a non-Markov continuous-time model of topical trends
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A mixture model for contextual text mining
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Topic sentiment mixture: modeling facets and opinions in weblogs
Proceedings of the 16th international conference on World Wide Web
Topic modeling with network regularization
Proceedings of the 17th international conference on World Wide Web
Joint latent topic models for text and citations
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Modeling hidden topics on document manifold
Proceedings of the 17th ACM conference on Information and knowledge management
A Comparative Study of Utilizing Topic Models for Information Retrieval
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Evaluation methods for topic models
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Expectation-propagation for the generative aspect model
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Latent topic feedback for information retrieval
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Regularized Latent Semantic Indexing: A New Approach to Large-Scale Topic Modeling
ACM Transactions on Information Systems (TOIS)
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Leveraging relevance cues for language modeling in speech recognition
Information Processing and Management: an International Journal
Unsupervised latent concept modeling to identify query facets
Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
The dual-sparse topic model: mining focused topics and focused terms in short text
Proceedings of the 23rd international conference on World wide web
Latent word context model for information retrieval
Information Retrieval
Hi-index | 0.00 |
Probabilistic topic models have recently attracted much attention because of their successful applications in many text mining tasks such as retrieval, summarization, categorization, and clustering. Although many existing studies have reported promising performance of these topic models, none of the work has systematically investigated the task performance of topic models; as a result, some critical questions that may affect the performance of all applications of topic models are mostly unanswered, particularly how to choose between competing models, how multiple local maxima affect task performance, and how to set parameters in topic models. In this paper, we address these questions by conducting a systematic investigation of two representative probabilistic topic models, probabilistic latent semantic analysis (PLSA) and Latent Dirichlet Allocation (LDA), using three representative text mining tasks, including document clustering, text categorization, and ad-hoc retrieval. The analysis of our experimental results provides deeper understanding of topic models and many useful insights about how to optimize the performance of topic models for these typical tasks. The task-based evaluation framework is generalizable to other topic models in the family of either PLSA or LDA.