Bayesian Folding-In with Dirichlet Kernels for PLSI

Authors:
Alexander Hinneburg;Hans-Henning Gabriel;Andrè Gohr
Affiliations:
-;-;-
Venue:
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Year:
2007

Citing 0
Cited 3

An Ad Hoc Information Retrieval Perspective on PLSI through Language Model Identification

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
PLSI: The True Fisher Kernel and beyond

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
RPLSA: A novel updating scheme for Probabilistic Latent Semantic Analysis

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

Probabilistic latent semantic indexing (PLSI) represents documents of a collection as mixture proportions of latent topics, which are learned from the collection by an expectation maximization (EM) algorithm. New documents or queries need to be folded into the latent topic space by a simplified version of the EM-algorithm. During PLSIFolding-in of a new document, the topic mixtures of the known documents are ignored. This may lead to a suboptimal model of the extended collection. Our new approach incorporates the topic mixtures of the known documents in a Bayesian way during foldingin. That knowledge is modeled as prior distribution over the topic simplex using a kernel density estimate of Dirichlet kernels. We demonstrate the advantages of the new Bayesian folding-in using real text data.