Improving the estimation of relevance models using large external corpora

Authors:
Fernando Diaz;Donald Metzler
Affiliations:
University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA
Venue:
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2006

Citing 10
Cited 59

The first text retrieval conference (TREC-1) Rockville, MD, U.S.A., 4–6 November, 1992

Information Processing and Management: an International Journal
Improving the effectiveness of information retrieval with local context analysis

ACM Transactions on Information Systems (TOIS)
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting redundancy in question answering

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the MSE robustness of batching estimators

Proceedings of the 33nd conference on Winter simulation
The impact of corpus size on question answering performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Boosting web retrieval through query operations

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter

IEEE Transactions on Information Theory - Part 2

Improving weak ad-hoc queries using wikipedia asexternal corpus

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of sentence retrieval techniques

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Semisupervised Query Expansion with Minimal Feedback

IEEE Transactions on Knowledge and Data Engineering
A syntactically-based query reformulation technique for information retrieval

Information Processing and Management: an International Journal
The opposite of smoothing: a language model approach to ranking query-specific document clusters

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A cluster-based resampling method for pseudo-relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval and feedback models for blog feed search

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A rank-aggregation approach to searching for optimal query-specific clusters

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Another Face of Search Engine: Web Search API's

IEA/AIE '08 Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial Intelligence
Search advertising using web relevance feedback

Proceedings of the 17th ACM conference on Information and knowledge management
Automatic query structuring from sentences for Japanese web retrieval

Proceedings of the 2nd ACM workshop on Improving non english web searching
Diversifying image search with user generated content

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Clusters, language models, and ad hoc information retrieval

ACM Transactions on Information Systems (TOIS)
Online expansion of rare queries for sponsored search

Proceedings of the 18th international conference on World wide web
Regression Rank: Learning to Meet the Opportunity of Descriptive Queries

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Using Contextual Information to Improve Search in Email Archives

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Re-ranking search results using language models of query-specific clusters

Information Retrieval
Predicting the Usefulness of Collection Enrichment for Enterprise Search

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
A study of selective collection enrichment for enterprise search

Proceedings of the 18th ACM conference on Information and knowledge management
A generative blog post retrieval model that uses query expansion based on external collections

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
A statistical view of binned retrieval models

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Aspect presence verification conditional on other aspects

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Supervised query modeling using wikipedia

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
PageRank without hyperlinks: Structural reranking using links induced by language models

ACM Transactions on Information Systems (TOIS)
Preliminary study into query translation for patent retrieval

PaIR '10 Proceedings of the 3rd international workshop on Patent information retrieval
Measuring the interestingness of articles in a limited user environment

Information Processing and Management: an International Journal
Utilizing inter-passage and inter-document similarities for reranking search results

ACM Transactions on Information Systems (TOIS)
Exploring social annotation tags to enhance information retrieval performance

AMT'10 Proceedings of the 6th international conference on Active media technology
A generative theory of relevance

Information Retrieval
An intelligent system for sentence retrieval and novelty mining

International Journal of Knowledge Engineering and Data Mining
News article ranking: leveraging the wisdom of bloggers

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Federated Search

Foundations and Trends in Information Retrieval
A Bayesian network approach to context sensitive query expansion

Proceedings of the 2011 ACM Symposium on Applied Computing
Active learning to maximize accuracy vs. effort in interactive information retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Enriching document representation via translation for improved monolingual information retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Multilingual sentence categorization and novelty mining

Information Processing and Management: an International Journal
Chinese categorization and novelty mining

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Linking archives using document enrichment and term selection

TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
The opposite of smoothing: a language model approach to ranking query-specific document clusters

Journal of Artificial Intelligence Research
Discovering missing click-through query language information for web search

Proceedings of the 20th ACM international conference on Information and knowledge management
Passage retrieval for incorporating global evidence in sequence labeling

Proceedings of the 20th ACM international conference on Information and knowledge management
A Survey of Automatic Query Expansion in Information Retrieval

ACM Computing Surveys (CSUR)
A cluster based pseudo feedback technique which exploits good and bad clusters

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Tapping into knowledge base for concept feedback: leveraging conceptnet to improve search results for difficult queries

Proceedings of the fifth ACM international conference on Web search and data mining
Effective query formulation with multiple information sources

Proceedings of the fifth ACM international conference on Web search and data mining
Credibility-inspired ranking for blog post retrieval

Information Retrieval
Proximity-based rocchio's model for pseudo relevance

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Exploiting External Collections for Query Expansion

ACM Transactions on the Web (TWEB)
Combining multi-level evidence for medical record retrieval

Proceedings of the 2012 international workshop on Smart health and wellbeing
Thesaurus-based feedback to support mixed search and browsing environments

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval

Journal of the American Society for Information Science and Technology
Estimating topical context by diverging from external resources

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
An adaptive evidence weighting method for medical record search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A deterministic resampling method using overlapping document clusters for pseudo-relevance feedback

Information Processing and Management: an International Journal
Inferring conceptual relationships to improve medical records search

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Unsupervised latent concept modeling to identify query facets

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Learning to handle negated language in medical records search

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Using temporal bursts for query modeling

Information Retrieval
Semantic concept-enriched dependence model for medical information retrieval

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information retrieval algorithms leverage various collection statistics to improve performance. Because these statistics are often computed on a relatively small evaluation corpus, we believe using larger, non-evaluation corpora should improve performance. Specifically, we advocate incorporating external corpora based on language modeling. We refer to this process as external expansion. When compared to traditional pseudo-relevance feedback techniques, external expansion is more stable across topics and up to 10% more effective in terms of mean average precision. Our results show that using a high quality corpus that is comparable to the evaluation corpus can be as, if not more, effective than using the web. Our results also show that external expansion outperforms simulated relevance feedback. In addition, we propose a method for predicting the extent to which external expansion will improve retrieval performance. Our new measure demonstrates positive correlation with improvements in mean average precision.