Query expansion using a collection dependent probabilistic latent semantic thesaurus

  • Authors:
  • Laurence A. F. Park;Kotagiri Ramamohanarao

  • Affiliations:
  • ARC Centre for Perceptive and Intelligent Machines in Complex Environments, Department of Computer Science and Software Engineering, The University of Melbourne, Australia;ARC Centre for Perceptive and Intelligent Machines in Complex Environments, Department of Computer Science and Software Engineering, The University of Melbourne, Australia

  • Venue:
  • PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many queries on collections of text documents are too short to produce informative results. Automatic query expansion is a method of adding terms to the query without interaction from the user in order to obtain more refined results. In this investigation, we examine our novel automatic query expansion method using the probabilistic latent semantic thesaurus, which is based on probabilistic latent semantic analysis. We show how to construct the thesaurus by mining text documents for probabilistic term relationships, and we show that by using the latent semantic thesaurus, we can overcome many of the problems associated to latent semantic analysis on large document sets which were previously identified. Experiments using TREC document sets show that our term expansion method out performs the popular probabilistic pseudorelevance feedback method by 7.3%.