Modeling term proximity for probabilistic information retrieval models

Authors:
Ben He;Jimmy Xiangji Huang;Xiaofeng Zhou
Affiliations:
Information Retrieval and Knowledge Management Research Lab, School of Information Technology, York University, Toronto, Canada;Information Retrieval and Knowledge Management Research Lab, School of Information Technology, York University, Toronto, Canada;Information Retrieval and Knowledge Management Research Lab, School of Information Technology, York University, Toronto, Canada
Venue:
Information Sciences: an International Journal
Year:
2011

Citing 31
Cited 7

Automatic phrase indexing for document retrieval

SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
The use of term position devices in ranked output experiments

Journal of Documentation
The use of phrases and structured queries in information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Some aspects of proximity searching in text retrieval systems

Journal of Information Science
Term dependence: truncating the Bahadur Lazarsfeld expansion

Information Processing and Management: an International Journal
Large-scale information retrieval with latent semantic indexing

Information Sciences: an International Journal
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Distribution of content words and phrases in text and language modelling

Natural Language Engineering
Dependence language model for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Simple BM25 extension to multiple weighted fields

Proceedings of the thirteenth ACM international conference on Information and knowledge management
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Term proximity scoring for ad-hoc retrieval on very large text collections

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Applying Data Mining to Pseudo-Relevance Feedback for High Performance Text Retrieval

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
An exploration of proximity measures in information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
On the quality of resources on the Web: An information retrieval perspective

Information Sciences: an International Journal
Learning semantic relatedness from term discrimination information

Expert Systems with Applications: An International Journal
A proximity language model for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Positional language models for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A bayesian learning approach to promoting diversity in ranking for biomedical information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Facet-based opinion retrieval from blogs

Information Processing and Management: an International Journal
Term proximity scoring for keyword-based retrieval systems

ECIR'03 Proceedings of the 25th European conference on IR research
Multinomial randomness models for retrieval with document fields

ECIR'07 Proceedings of the 29th European conference on IR research
Viewing term proximity from a different perspective

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Collection-integral source selection for uncooperative distributed information retrieval environments

Information Sciences: an International Journal
Modeling information sources as integrals for effective and efficient source selection

Information Processing and Management: an International Journal
Mining and modeling linkage information from citation context for improving biomedical literature retrieval

Information Processing and Management: an International Journal
Incorporating rich features to boost information retrieval performance: A SVM-regression based re-ranking approach

Expert Systems with Applications: An International Journal
Boosting web retrieval through query operations

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
An application of swarm intelligence to distributed image retrieval

Information Sciences: an International Journal

Optimal mean-square state and parameter estimation for stochastic linear systems with Poisson noises

Information Sciences: an International Journal
Proximity-based rocchio's model for pseudo relevance

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A hybrid model for ad-hoc information retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Sketch-based indexing of n-words

Proceedings of the 21st ACM international conference on Information and knowledge management
Indexing Word Sequences for Ranked Retrieval

ACM Transactions on Information Systems (TOIS)
Scaling up cosine interesting pattern discovery: A depth-first method

Information Sciences: an International Journal
Semantic concept-enriched dependence model for medical information retrieval

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.07

Visualization

Abstract

Proximity among query terms has been found to be useful for improving retrieval performance. However, its application to classical probabilistic information retrieval models, such as Okapi's BM25, remains a challenging research problem. In this paper, we propose to improve the classical BM25 model by utilizing the term proximity evidence. Four novel methods, namely a window-based N-gram Counting method, Survival Analysis over different statistics, including the Poisson process, an exponential distribution and an empirical function, are proposed to model the proximity between query terms. Through extensive experiments on standard TREC collections, our proposed proximity-based BM25 model, called BM25P, is compared to strong state-of-the-art evaluation baselines, including the original unigram BM25 model, the Markov Random Field model, and the positional language model. According to the experimental results, the window-based N-gram Counting method, and Survival Analysis over an exponential distribution are the most effective among all four proposed methods, which lead to marked improvement over the baselines. This shows that the use of term proximity considerably enhances the retrieval effectiveness of the classical probabilistic models. It is therefore recommended to deploy a term proximity component in retrieval systems that employ probabilistic models.