Viewing term proximity from a different perspective

Authors:
Ruihua Song;Michael J. Taylor;Ji-Rong Wen;Hsiao-Wuen Hon;Yong Yu
Affiliations:
Dept. of Computer Science and Engineer, Shanghai Jiao Tong University, Shanghai, China and Microsoft Research Asia, Beijing, China;Microsoft Research Ltd, Cambridge, England;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China;Dept. of Computer Science and Engineer, Shanghai Jiao Tong University, Shanghai, China
Venue:
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Year:
2008

Citing 17
Cited 17

Automatic phrase indexing for document retrieval

SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
The use of phrases and structured queries in information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Term dependence: truncating the Bahadur Lazarsfeld expansion

Information Processing and Management: an International Journal
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
A general language model for information retrieval

Proceedings of the eighth international conference on Information and knowledge management
Relevance ranking for one to three term queries

Information Processing and Management: an International Journal
Experimentation as a way of life: Okapi at TREC

Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Searching the Web: the public and their queries

Journal of the American Society for Information Science and Technology
A stop list for general text

ACM SIGIR Forum
Biterm language models for document retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Capturing term dependencies using a language model based on sentence trees

Proceedings of the eleventh international conference on Information and knowledge management
A Generalized Term Dependence Model in Information Retrieval

A Generalized Term Dependence Model in Information Retrieval
Dependence language model for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Term proximity scoring for ad-hoc retrieval on very large text collections

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Term proximity scoring for keyword-based retrieval systems

ECIR'03 Proceedings of the 25th European conference on IR research
Boosting web retrieval through query operations

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Positional language models for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A machine learning approach for improved BM25 retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
The Probabilistic Relevance Framework: BM25 and Beyond

Foundations and Trends in Information Retrieval
Focused retrieval with proximity scoring

Proceedings of the 2010 ACM Symposium on Applied Computing
How good is a span of terms?: exploiting proximity to improve web retrieval

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Efficient term proximity search with term-pair indexes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
ENSM-SE at INEX 2009: scoring with proximity and semantic tag information

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Modeling term proximity for probabilistic information retrieval models

Information Sciences: an International Journal
Parameterized concept weighting in verbose queries

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
High-performance processing of text queries with tunable pruned term and term pair indexes

ACM Transactions on Information Systems (TOIS)
Effective query formulation with multiple information sources

Proceedings of the fifth ACM international conference on Web search and data mining
Personalized query expansion in the QIC system

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Natural language technology and query expansion: issues, state-of-the-art and perspectives

Journal of Intelligent Information Systems
Combining relevancy and methodological quality into a single ranking for evidence-based medicine

Information Sciences: an International Journal
Proximity-based rocchio's model for pseudo relevance

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Enhancement of passage scorers by proximity-based term occurrence weighting

International Journal of Intelligent Information and Database Systems
Semantic concept-enriched dependence model for medical information retrieval

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper extends the state-of-the-art probabilistic model BM25 to utilize term proximity from a new perspective. Most previous work only consider dependencies between pairs of terms, and regard phrases as additional independent evidence. It is difficult to estimate the importance of a phrase and its extra contribution to a relevance score, as the phrase actually overlaps with the component terms. This paper proposes a new approach. First, query terms are grouped locally into non-overlapping phrases that may contain one or more query terms. Second, these phrases are not scored independently but are instead treated as providing a context for the component query terms. The relevance contribution of a term occurrence is measured by how many query terms occur in the context phrase and how compact they are. Third, we replace term frequency by the accumulated relevance contribution. Consequently, term proximity is easily integrated into the probabilistic model. Experimental results on TREC-10 and TREC-11 collections show stable improvements in terms of average precision and significant improvements in terms of top precisions.