Active relevance feedback for difficult queries

Authors:
Zuobing Xu;Ram Akella
Affiliations:
University of California, Santa Cruz, Santa Cruz, CA, USA;University of California, Santa Cruz, Santa Cruz, CA, USA
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 22
Cited 10

Relevance feedback revisited

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Deterministic annealing EM algorithm

Neural Networks
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Predicting query performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Support Vector Machine Active Learning with Application sto Text Classification

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Active Learning with Statistical Models

Active Learning with Statistical Models
A formal study of information retrieval heuristics

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Active feedback in ad hoc information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling word burstiness using the Dirichlet distribution

ICML '05 Proceedings of the 22nd international conference on Machine learning
Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution

ICML '06 Proceedings of the 23rd international conference on Machine learning
Active learning via transductive experimental design

ICML '06 Proceedings of the 23rd international conference on Machine learning
Regularized estimation of mixture models for robust pseudo-relevance feedback

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
What makes a query difficult?

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Ranking robustness: a novel framework to predict query performance

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Query rewriting using active learning for sponsored search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Improve retrieval accuracy for difficult queries using negative feedback

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A bayesian logistic regression model for active relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A new probabilistic retrieval model based on the dirichlet compound multinomial distribution

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Incorporating diversity and density in active learning for relevance feedback

ECIR'07 Proceedings of the 29th European conference on IR research

A semi-supervised incremental algorithm to automatically formulate topical queries

Information Sciences: an International Journal
Knowledge sciences in services automation: integration models and perspectives for service centers

CASE'09 Proceedings of the fifth annual IEEE international conference on Automation science and engineering
Language models for web object retrieval

WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
Improving probabilistic information retrieval by modeling burstiness of words

Information Processing and Management: an International Journal
Mining positive and negative patterns for relevance feature discovery

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamic ranked retrieval

Proceedings of the fourth ACM international conference on Web search and data mining
EGAL: exploration guided active learning for TCBR

ICCBR'10 Proceedings of the 18th international conference on Case-Based Reasoning Research and Development
A model for mining relevant and non-redundant information

Proceedings of the 27th Annual ACM Symposium on Applied Computing
A split-list approach for relevance feedback in information retrieval

Information Processing and Management: an International Journal
Interactive exploratory search for multi page search results

Proceedings of the 22nd international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Relevance feedback has been demonstrated to be an effective strategy for improving retrieval accuracy. The existing relevance feedback algorithms based on language models and vector space models are not effective in learning from negative feedback documents, which are abundant if the initial query is difficult. The probabilistic retrieval model has the advantage of being able to naturally improve the estimation of both the relevant and non-relevant models. The Dirichlet compound multinomial (DCM) distribution, which relies on hierarchical Bayesian modeling techniques, is a more appropriate generative model for the probabilistic retrieval model than the traditional multinomial distribution. We propose a new relevance feedback algorithm, based on a mixture model of the DCM distribution, to effectively model the overlaps between the positive and negative feedback documents. Consequently, the new algorithm improves the retrieval performance substantially for difficult queries. To further reduce human relevance evaluation, we propose a new active learning algorithm in conjunction with the new relevance feedback model. The new active learning algorithm implicitly models the diversity, density and relevance of unlabeled data in a transductive experimental design framework. Experimental results on several TREC datasets show that both the relevance feedback and active learning algorithm significantly improve retrieval accuracy.