Using controlled query generation to evaluate blind relevance feedback algorithms

Authors:
Chris Jordan;Carolyn Watters;Qigang Gao
Affiliations:
Dalhousie University, Halifax, NS, Canada;Dalhousie University, Halifax, NS, Canada;Dalhousie University, Halifax, NS, Canada
Venue:
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Year:
2006

Citing 16
Cited 13

Elements of information theory

Elements of information theory
Improving automatic query expansion

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval

Proceedings of the eighth international conference on Information and knowledge management
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Flexible pseudo-relevance feedback using optimization tables

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Automatic query expansion based on divergence

Proceedings of the tenth international conference on Information and knowledge management
Modern Information Retrieval

Modern Information Retrieval
Predicting query performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The Philosophy of Information Retrieval Evaluation

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Query length in interactive information retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Effect of varying number of documents in blind feedback: analysis of the 2003 NRRC RIA workshop "bf_numdocs" experiment suite

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Comparison of using passages and documents for blind relevance feedback in information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The loquacious user: a document-independent source of terms for query expansion

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Building simulated queries for known-item topics: an analysis using six european languages

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating the effectiveness of relevance feedback based on a user simulation model: effects of a user scenario on cumulated gain value

Information Retrieval
Query side evaluation: an empirical analysis of effectiveness and effort

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection

Proceedings of the 18th ACM conference on Information and knowledge management
Identification of low/high retrievable patents using content-based features

Proceedings of the 2nd international workshop on Patent information retrieval
PRES: a score metric for evaluating recall-oriented information retrieval applications

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Validating query simulators: an experiment using commercial searches and purchases

CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Improving retrievability and recall by automatic corpus partitioning

Transactions on large-scale data- and knowledge-centered systems II
Improving retrievability and recall by automatic corpus partitioning

Transactions on large-scale data- and knowledge-centered systems II
The economics in interactive information retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Improving retrievability with improved cluster-based pseudo-relevance feedback selection

Expert Systems with Applications: An International Journal
Information retrieval strategies for digitized handwritten medieval documents

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Generating queries from user-selected text

Proceedings of the 4th Information Interaction in Context Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Currently in document retrieval there are many algorithms each with different strengths and weakness. There is some difficulty, however, in evaluating the impact of the test query set on retrieval results. The traditional evaluation process, the Cranfield evaluation paradigm, which uses a corpus and a set of user queries, focuses on making the queries as re-alistic as possible. Unfortunately such query sets lack the fine grained control necessary to test algorithm properties. We present an approach called Controlled Query Generation (CQG) that creates query sets from documents in the corpus in a way that regulates the theoretic information quality of each query. This allows us to generate reproducible and well defined sets of queries of varying length and term specificity. Imposing this level of control over the query sets used for testing retrieval algorithms enables the rigorous simulation of different query environments to identify specific algorithm properties before introducing user queries. In this work, we demonstrate the usefulness of CQG by generating three dif-ferent query environments to investigate characteristics of two blind relevance feedback approaches.