Exploiting External Collections for Query Expansion

Authors:
Wouter Weerkamp;Krisztian Balog;Maarten de Rijke
Affiliations:
University of Amsterdam;NTNU Trondheim;University of Amsterdam
Venue:
ACM Transactions on the Web (TWEB)
Year:
2012

Citing 29
Cited 1

Concept based query expansion

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Length normalization in XML retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A framework for selective query expansion

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Better than the real thing?: iterative pseudo-query processing using cluster-based language models

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Improving the estimation of relevance models using large external corpora

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Regularized estimation of mixture models for robust pseudo-relevance feedback

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Combining fields for query expansion and adaptive query expansion

Information Processing and Management: an International Journal
Query expansion using probabilistic local feedback with application to multimedia retrieval

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Selecting good expansion terms for pseudo-relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval and feedback models for blog feed search

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A few examples go a long way: constructing query models from elaborate query formulations

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Query Expansion Using External Evidence

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Meme-tracking and the dynamics of the news cycle

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Query dependent pseudo-relevance feedback based on wikipedia

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Adaptive relevance feedback in information retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
A query model based on normalized log-likelihood

Proceedings of the 18th ACM conference on Information and knowledge management
Finding good feedback documents

Proceedings of the 18th ACM conference on Information and knowledge management
Towards recency ranking in web search

Proceedings of the third ACM international conference on Web search and data mining
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
Modern Information Retrieval

Modern Information Retrieval
Generating focused topic-specific sentiment lexicons

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
LambdaMerge: merging the results of query reformulations

Proceedings of the fourth ACM international conference on Web search and data mining
A study of blog search

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Credibility-inspired ranking for blog post retrieval

Information Retrieval

Using temporal bursts for query modeling

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

A persisting challenge in the field of information retrieval is the vocabulary mismatch between a user’s information need and the relevant documents. One way of addressing this issue is to apply query modeling: to add terms to the original query and reweigh the terms. In social media, where documents usually contain creative and noisy language (e.g., spelling and grammatical errors), query modeling proves difficult. To address this, attempts to use external sources for query modeling have been made and seem to be successful. In this article we propose a general generative query expansion model that uses external document collections for term generation: the External Expansion Model (EEM). The main rationale behind our model is our hypothesis that each query requires its own mixture of external collections for expansion and that an expansion model should account for this. For some queries we expect, for example, a news collection to be most beneficial, while for other queries we could benefit more by selecting terms from a general encyclopedia. EEM allows for query-dependent weighing of the external collections. We put our model to the test on the task of blog post retrieval and we use four external collections in our experiments: (i) a news collection, (ii) a Web collection, (iii) Wikipedia, and (iv) a blog post collection. Experiments show that EEM outperforms query expansion on the individual collections, as well as the Mixture of Relevance Models that was previously proposed by Diaz and Metzler [2006]. Extensive analysis of the results shows that our naive approach to estimating query-dependent collection importance works reasonably well and that, when we use “oracle” settings, we see the full potential of our model. We also find that the query-dependent collection importance has more impact on retrieval performance than the independent collection importance (i.e., a collection prior).