On the number of terms used in automatic query expansion

  • Authors:
  • Paul Ogilvie;Ellen Voorhees;Jamie Callan

  • Affiliations:
  • Language Technologies Institute, Carnegie Mellon University, Pittsburgh, USA and mSpoke Inc., Pittsburgh, USA;National Institute of Standards and Technology, Gaithersburg, USA;Language Technologies Institute, Carnegie Mellon University, Pittsburgh, USA

  • Venue:
  • Information Retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper investigates the number of expansion terms to use in automatic query expansion by examining the behavior of eight retrieval systems participating in the NRRC Reliable Information Access Workshop. The results demonstrate that current systems are able to obtain nearly all of the benefit of using a fixed number of expansion terms per topic, but significant additional improvement is possible if systems were able to accurately select the best number of expansion terms on a per topic basis. When optimizing average effectiveness as measured by mean average precision, using a fixed number of terms increases the score a large amount for a small number of topics but has little effect for most topics. The analysis further suggests that when a topic is helped by automatic feedback, the increase is from a set of terms that reinforce each other rather than from the system finding a single excellent term.