A novel updating scheme for probabilistic latent semantic indexing

Authors:
Constantine Kotropoulos;Athanasios Papaioannou
Affiliations:
Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece;Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
Venue:
SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence
Year:
2006

Citing 9
Cited 0

Latent semantic indexing is an optimal special case of multidimensional scaling

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Understanding search engines: mathematical modeling and text retrieval

Understanding search engines: mathematical modeling and text retrieval
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing

Communications of the ACM
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Combining text and link analysis for focused crawling

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Probabilistic Latent Semantic Indexing (PLSI) is a statistical technique for automatic document indexing. A novel method is proposed for updating PLSI when new documents arrive. The proposed method adds incrementally the words of any new document in the term-document matrix and derives the updating equations for the probability of terms given the class (i.e. latent) variables and the probability of documents given the latent variables. The performance of the proposed method is compared to that of the folding-in algorithm, which is an inexpensive, but potentially inaccurate updating method. It is demonstrated that the proposed updating algorithm outperforms the folding-in method with respect to the mean squared error between the aforementioned probabilities as they are estimated by the two updating methods and the original non-adaptive PLSI algorithm.