How good is a span of terms?: exploiting proximity to improve web retrieval

Authors:
Krysta M. Svore;Pallika H. Kanani;Nazan Khan
Affiliations:
Microsoft Research, Redmond, WA, USA;UMass, Amherst, Amherst, MA, USA;Microsoft, Redmond, WA, USA
Venue:
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Year:
2010

Citing 17
Cited 14

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic model of information retrieval: development and comparative experiments

Information Processing and Management: an International Journal
Simple BM25 extension to multiple weighted fields

Proceedings of the thirteenth ACM international conference on Information and knowledge management
An information retrieval model using the fuzzy proximity degree of term occurences

Proceedings of the 2005 ACM symposium on Applied computing
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Term proximity scoring for ad-hoc retrieval on very large text collections

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Optimisation methods for ranking functions with multiple parameters

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
An exploration of proximity measures in information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Context sensitive stemming for web search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Positional language models for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
On the local optimality of LambdaRank

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A machine learning approach for improved BM25 retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
Term proximity scoring for keyword-based retrieval systems

ECIR'03 Proceedings of the 25th European conference on IR research
Efficient text proximity search

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Viewing term proximity from a different perspective

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Boosting web retrieval through query operations

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Identifying disease diagnosis factors by proximity-based mining of medical texts

ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part II
Parameterized concept weighting in verbose queries

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Effective query formulation with multiple information sources

Proceedings of the fifth ACM international conference on Web search and data mining
A log-logistic model-based interpretation of TF normalization of BM25

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Combining relevancy and methodological quality into a single ranking for evidence-based medicine

Information Sciences: an International Journal
Extending BM25 with multiple query operators

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Modeling higher-order term dependencies in information retrieval using query hypergraphs

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Reordering an index to speed query processing without loss of effectiveness

Proceedings of the Seventeenth Australasian Document Computing Symposium
Reduction of training noises for text classifiers

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part II
Modeling reformulation using query distributions

ACM Transactions on Information Systems (TOIS)
Learning joint query interpretation and response ranking

Proceedings of the 22nd international conference on World Wide Web
Permutation indexing: fast approximate retrieval from large corpora

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Enhancement of passage scorers by proximity-based term occurrence weighting

International Journal of Intelligent Information and Database Systems
Semantic concept-enriched dependence model for medical information retrieval

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ranking search results is a fundamental problem in information retrieval. In this paper we explore whether the use of proximity and phrase information can improve web retrieval accuracy. We build on existing research by incorporating novel ranking features based on flexible proximity terms with recent state-of-the-art machine learning ranking models. We introduce a method of determining the goodness of a set of proximity terms that takes advantage of the structured nature of web documents, document metadata, and phrasal information from search engine user query logs. We perform experiments on a large real-world Web data collection and show that using the goodness score of flexible proximity terms can improve ranking accuracy over state-of-the-art ranking methods by as much as 13%. We also show that we can improve accuracy on the hardest queries by as much as 9% relative to state-of-the-art approaches.