Using controlled query generation to evaluate blind relevance feedback algorithms

  • Authors:
  • Chris Jordan;Carolyn Watters;Qigang Gao

  • Affiliations:
  • Dalhousie University, Halifax, NS, Canada;Dalhousie University, Halifax, NS, Canada;Dalhousie University, Halifax, NS, Canada

  • Venue:
  • Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Currently in document retrieval there are many algorithms each with different strengths and weakness. There is some difficulty, however, in evaluating the impact of the test query set on retrieval results. The traditional evaluation process, the Cranfield evaluation paradigm, which uses a corpus and a set of user queries, focuses on making the queries as re-alistic as possible. Unfortunately such query sets lack the fine grained control necessary to test algorithm properties. We present an approach called Controlled Query Generation (CQG) that creates query sets from documents in the corpus in a way that regulates the theoretic information quality of each query. This allows us to generate reproducible and well defined sets of queries of varying length and term specificity. Imposing this level of control over the query sets used for testing retrieval algorithms enables the rigorous simulation of different query environments to identify specific algorithm properties before introducing user queries. In this work, we demonstrate the usefulness of CQG by generating three dif-ferent query environments to investigate characteristics of two blind relevance feedback approaches.