A novel updating scheme for probabilistic latent semantic indexing

  • Authors:
  • Constantine Kotropoulos;Athanasios Papaioannou

  • Affiliations:
  • Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece;Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece

  • Venue:
  • SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Probabilistic Latent Semantic Indexing (PLSI) is a statistical technique for automatic document indexing. A novel method is proposed for updating PLSI when new documents arrive. The proposed method adds incrementally the words of any new document in the term-document matrix and derives the updating equations for the probability of terms given the class (i.e. latent) variables and the probability of documents given the latent variables. The performance of the proposed method is compared to that of the folding-in algorithm, which is an inexpensive, but potentially inaccurate updating method. It is demonstrated that the proposed updating algorithm outperforms the folding-in method with respect to the mean squared error between the aforementioned probabilities as they are estimated by the two updating methods and the original non-adaptive PLSI algorithm.