Query expansion for language modeling using sentence similarities

Authors:
Debasis Ganguly;Johannes Leveling;Gareth J. F. Jones
Affiliations:
CNGL, School of Computing, Dublin City University, Ireland;CNGL, School of Computing, Dublin City University, Ireland;CNGL, School of Computing, Dublin City University, Ireland
Venue:
IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
Year:
2011

Citing 22
Cited 1

Passage-level evidence in document retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Effective retrieval of structured documents

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance feedback with too much data

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Advantages of query biased summaries in information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Improving automatic query expansion

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Improving the effectiveness of information retrieval with local context analysis

ACM Transactions on Information Systems (TOIS)
A language modeling approach to information retrieval

A language modeling approach to information retrieval
Applying summarization techniques for term selection in relevance feedback

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The NRRC reliable information access (RIA) workshop

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Why current IR engines fail

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Questioning query expansion: an examination of behaviour and parameters

ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Poison pills: harmful relevant documents in feedback

Proceedings of the 14th ACM international conference on Information and knowledge management
Flexible pseudo-relevance feedback via selective sampling

ACM Transactions on Asian Language Information Processing (TALIP)
Aspects of sentence retrieval

Aspects of sentence retrieval
Selecting good expansion terms for pseudo-relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
On the number of terms used in automatic query expansion

Information Retrieval
Interactive relevance feedback with graded relevance and sentence extraction: simulated user experiments

Proceedings of the 18th ACM conference on Information and knowledge management
Positional relevance model for pseudo-relevance feedback

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Statistical query expansion for sentence retrieval and its effects on weak and strong queries

Information Retrieval
Classifying and filtering blind feedback terms to improve information retrieval effectiveness

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information

Utilizing sub-topical structure of documents for information retrieval

Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel method of query expansion for Language Modeling (LM) in Information Retrieval (IR) based on the similarity of the query with sentences in the top ranked documents from an initial retrieval run. In justification of our approach, we argue that the terms in the expanded query obtained by the proposed method roughly follow a Dirichlet distribution which, being the conjugate prior of the multinomial distribution used in the LM retrieval model, helps the feedback step. IR experiments on the TREC ad-hoc retrieval test collections using the sentence based query expansion (SBQE) show a significant increase in Mean Average Precision (MAP) compared to baselines obtained using standard term-based query expansion using LM selection score and the Relevance Model (RLM). The proposed approach to query expansion for LM increases the likelihood of generation of the pseudo-relevant documents by adding sentences with maximum term overlap with the query sentences for each top ranked pseudorelevant document thus making the query look more like these documents. A per topic analysis shows that the new method hurts less queries compared to the base-line feedback methods, and improves average precision (AP) over a broad range of queries ranging from easy to difficult in terms of the initial retrieval AP. We also show that the new method is able to add a higher number of good feedback terms (the golden standard of good terms being the set of terms added by True Relevance Feedback). Additional experiments on the challenging search topics of the TREC-2004 Robust track show that the new method is able to improve MAP by 5.7% without the use of external resources and query hardness prediction typically used for these topics.