The Sensitivity of Latent Dirichlet Allocation for Information Retrieval

Authors:
Laurence A. Park;Kotagiri Ramamohanarao
Affiliations:
Department of Computer Science and Software Engineering, The University of Melbourne, Australia 3010;Department of Computer Science and Software Engineering, The University of Melbourne, Australia 3010
Venue:
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Year:
2009

Citing 10
Cited 2

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic model of information retrieval: development and comparative experiments Part 2

Information Processing and Management: an International Journal
On an equivalence between PLSI and LDA

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Hybrid Pre-Query Term Expansion using Latent Semantic Analysis

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
An analysis of latent semantic term self-correlation

ACM Transactions on Information Systems (TOIS)
The Effect of Weighted Term Frequencies on Probabilistic Latent Semantic Term Relationships

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Efficient storage and retrieval of probabilistic latent semantic information for information retrieval

The VLDB Journal — The International Journal on Very Large Data Bases
Query expansion using a collection dependent probabilistic latent semantic thesaurus

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining

Latent topic feedback for information retrieval

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised latent concept modeling to identify query facets

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

It has been shown that the use of topic models for Information retrieval provides an increase in precision when used in the appropriate form. Latent Dirichlet Allocation (LDA) is a generative topic model that allows us to model documents using a Dirichlet prior. Using this topic model, we are able to obtain a fitted Dirichlet parameter that provides the maximum likelihood for the document set. In this article, we examine the sensitivity of LDA with respect to the Dirichlet parameter when used for Information retrieval. We compare the topic model computation times, storage requirements and retrieval precision of fitted LDA to LDA with a uniform Dirichlet prior. The results show there there is no significant benefit of using fitted LDA over the LDA with a constant Dirichlet parameter, hence showing that LDA is insensitive with respect to the Dirichlet parameter when used for Information retrieval.