Selecting good expansion terms for pseudo-relevance feedback

Authors:
Guihong Cao;Jian-Yun Nie;Jianfeng Gao;Stephen Robertson
Affiliations:
University of Montreal, Montreal, PQ, Canada;University of Montreal, Montreal, PQ, Canada;Microsoft Research, Redmond, WA, USA;Microsoft Research at Cambridge, Cambridge, United Kngdm
Venue:
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2008

Citing 12
Cited 84

On term selection for query expansion

Journal of Documentation
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Linear discriminant model for information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Regularized estimation of mixture models for robust pseudo-relevance feedback

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Using query contexts in information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An exploration of proximity measures in information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Latent concept expansion using markov random fields

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

A semi-supervised incremental algorithm to automatically formulate topical queries

Information Sciences: an International Journal
Query Expansion Using External Evidence

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Studying Query Expansion Effectiveness

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Query dependent pseudo-relevance feedback based on wikipedia

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Effective query expansion for federated search

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Placing flickr photos on a map

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A User Profiles Acquiring Approach Using Pseudo-Relevance Feedback

RSKT '09 Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology
"A term is known by the company it keeps": On Selecting a Good Expansion Set in Pseudo-Relevance Feedback

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Reducing the risk of query expansion via robust constrained optimization

Proceedings of the 18th ACM conference on Information and knowledge management
A term dependency-based approach for query terms ranking

Proceedings of the 18th ACM conference on Information and knowledge management
A study of selective collection enrichment for enterprise search

Proceedings of the 18th ACM conference on Information and knowledge management
Finding good feedback documents

Proceedings of the 18th ACM conference on Information and knowledge management
Selecting Effective Terms for Query Formulation

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Pseudo relevance feedback with incremental learning for high level feature detection

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Learning concept importance using a weighted dependence model

Proceedings of the third ACM international conference on Web search and data mining
Positional relevance model for pseudo-relevance feedback

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Multilingual PRF: english lends a helping hand

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Investigating the suboptimality and instability of pseudo-relevance feedback

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Mining positive and negative patterns for relevance feature discovery

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Multilingual pseudo-relevance feedback: performance study of assisting languages

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Improved latent concept expansion using hierarchical markov random fields

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Clickthrough-based translation models for web search: from word models to phrase models

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Query model refinement using word graphs

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Modeling reformulation using passage analysis

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A relative word-frequency based method for relevance feedback

AIMSA'10 Proceedings of the 14th international conference on Artificial intelligence: methodology, systems, and applications
Exploring social annotation tags to enhance information retrieval performance

AMT'10 Proceedings of the 6th international conference on Active media technology
Using text classification method in relevance feedback

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part II
A framework for automatic query expansion

WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
Dynamic ranked retrieval

Proceedings of the fourth ACM international conference on Web search and data mining
Classifying and filtering blind feedback terms to improve information retrieval effectiveness

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Negative feedback: the forsaken nature available for re-ranking

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Improving effectiveness of query expansion using information theoretic approach

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Query expansion based on clustered results

Proceedings of the VLDB Endowment
A Bayesian network approach to context sensitive query expansion

Proceedings of the 2011 ACM Symposium on Applied Computing
TEMPER: a temporal relevance feedback method

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
A boosting approach to improving pseudo-relevance feedback

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Social annotation in query expansion: a machine learning approach

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Parameterized concept weighting in verbose queries

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Exploring term temporality for pseudo-relevance feedback

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Query expansion for language modeling using sentence similarities

IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
Mining Concept Sequences from Large-Scale Search Logs for Context-Aware Query Suggestion

ACM Transactions on Intelligent Systems and Technology (TIST)
A pseudo relevance feedback based cross domain video concept detection

Proceedings of the Third International Conference on Internet Multimedia Computing and Service
Towards multilingual user models for Personalized Multilingual Information Retrieval

Proceedings of the First Workshop on Personalised Multilingual Hypertext Retrieval
A study on query expansion methods for patent retrieval

Proceedings of the 4th workshop on Patent information retrieval
A Survey of Automatic Query Expansion in Information Retrieval

ACM Computing Surveys (CSUR)
A two-stage decision model for information filtering

Decision Support Systems
Effective query formulation with multiple information sources

Proceedings of the fifth ACM international conference on Web search and data mining
Improving retrievability of patents in prior-art search

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Improving retrievability with improved cluster-based pseudo-relevance feedback selection

Expert Systems with Applications: An International Journal
Boosting web video categorization with contextual information from social web

World Wide Web
Query phrase expansion using wikipedia in patent class search

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
An efficient framework for constructing generalized locally-induced text metrics

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Support vector machines for anti-pattern detection

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Learning-Based pseudo-relevance feedback for patent retrieval

IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Discovering relevant features for effective query formulation

IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Structured event retrieval over microblog archives

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Exploiting External Collections for Query Expansion

ACM Transactions on the Web (TWEB)
Learning lexicon models from search logs for query expansion

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Generating event storylines from microblogs

Proceedings of the 21st ACM international conference on Information and knowledge management
Selecting expansion terms as a set via integer linear programming

Proceedings of the 21st ACM international conference on Information and knowledge management
Hidden markov model for term weighting in verbose queries

CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
High performance query expansion using adaptive co-training

Information Processing and Management: an International Journal
Improving image tags by exploiting web search results

Multimedia Tools and Applications
Semantic Query Expansion using Cluster Based Domain Ontologies

International Journal of Information Retrieval Research
Modeling reformulation using query distributions

ACM Transactions on Information Systems (TOIS)
Relevance Feedback Fusion via Query Expansion

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Ontology-based personalised retrieval in support of reminiscence

Knowledge-Based Systems
An incremental approach to efficient pseudo-relevance feedback

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Query expansion using path-constrained random walks

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Modeling click-through based word-pairs for web search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Query change as relevance feedback in session search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Assisting code search with automatic query reformulation for bug localization

Proceedings of the 10th Working Conference on Mining Software Repositories
On using inter-document relations in microblog retrieval

Proceedings of the 22nd international conference on World Wide Web companion
Interactive exploratory search for multi page search results

Proceedings of the 22nd international conference on World Wide Web
Selecting effective expansion terms for diversity

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Improving pseudo-relevance feedback via tweet selection

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Learning to handle negated language in medical records search

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Predicting the impact of expansion terms using semantic and user interaction features

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Social semantic query expansion

ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
Towards Concept-Based Translation Models Using Search Logs for Query Expansion

Proceedings of the 21st ACM international conference on Information and knowledge management
An Ontology-Based Query Expansion for an Agricultural Expert Retrieval System

Proceedings of International Conference on Information Integration and Web-based Applications & Services
Hybrid pseudo-relevance feedback for microblog retrieval

Journal of Information Science
Personalised Information Retrieval: survey and classification

User Modeling and User-Adapted Interaction
Latent word context model for information retrieval

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pseudo-relevance feedback assumes that most frequent terms in the pseudo-feedback documents are useful for the retrieval. In this study, we re-examine this assumption and show that it does not hold in reality - many expansion terms identified in traditional approaches are indeed unrelated to the query and harmful to the retrieval. We also show that good expansion terms cannot be distinguished from bad ones merely on their distributions in the feedback documents and in the whole collection. We then propose to integrate a term classification process to predict the usefulness of expansion terms. Multiple additional features can be integrated in this process. Our experiments on three TREC collections show that retrieval effectiveness can be much improved when term classification is used. In addition, we also demonstrate that good terms should be identified directly according to their possible impact on the retrieval effectiveness, i.e. using supervised learning, instead of unsupervised learning.