Automatic phrase indexing for document retrieval
SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
The use of term position devices in ranked output experiments
Journal of Documentation
The use of phrases and structured queries in information retrieval
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Some aspects of proximity searching in text retrieval systems
Journal of Information Science
Term dependence: truncating the Bahadur Lazarsfeld expansion
Information Processing and Management: an International Journal
Large-scale information retrieval with latent semantic indexing
Information Sciences: an International Journal
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Distribution of content words and phrases in text and language modelling
Natural Language Engineering
Dependence language model for information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Simple BM25 extension to multiple weighted fields
Proceedings of the thirteenth ACM international conference on Information and knowledge management
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Term proximity scoring for ad-hoc retrieval on very large text collections
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Applying Data Mining to Pseudo-Relevance Feedback for High Performance Text Retrieval
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
An exploration of proximity measures in information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
On the quality of resources on the Web: An information retrieval perspective
Information Sciences: an International Journal
Learning semantic relatedness from term discrimination information
Expert Systems with Applications: An International Journal
A proximity language model for information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Positional language models for information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A bayesian learning approach to promoting diversity in ranking for biomedical information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Facet-based opinion retrieval from blogs
Information Processing and Management: an International Journal
Term proximity scoring for keyword-based retrieval systems
ECIR'03 Proceedings of the 25th European conference on IR research
Multinomial randomness models for retrieval with document fields
ECIR'07 Proceedings of the 29th European conference on IR research
Viewing term proximity from a different perspective
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Information Sciences: an International Journal
Modeling information sources as integrals for effective and efficient source selection
Information Processing and Management: an International Journal
Information Processing and Management: an International Journal
Expert Systems with Applications: An International Journal
Boosting web retrieval through query operations
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
An application of swarm intelligence to distributed image retrieval
Information Sciences: an International Journal
Optimal mean-square state and parameter estimation for stochastic linear systems with Poisson noises
Information Sciences: an International Journal
Proximity-based rocchio's model for pseudo relevance
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A hybrid model for ad-hoc information retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Sketch-based indexing of n-words
Proceedings of the 21st ACM international conference on Information and knowledge management
Indexing Word Sequences for Ranked Retrieval
ACM Transactions on Information Systems (TOIS)
Scaling up cosine interesting pattern discovery: A depth-first method
Information Sciences: an International Journal
Semantic concept-enriched dependence model for medical information retrieval
Journal of Biomedical Informatics
Hi-index | 0.07 |
Proximity among query terms has been found to be useful for improving retrieval performance. However, its application to classical probabilistic information retrieval models, such as Okapi's BM25, remains a challenging research problem. In this paper, we propose to improve the classical BM25 model by utilizing the term proximity evidence. Four novel methods, namely a window-based N-gram Counting method, Survival Analysis over different statistics, including the Poisson process, an exponential distribution and an empirical function, are proposed to model the proximity between query terms. Through extensive experiments on standard TREC collections, our proposed proximity-based BM25 model, called BM25P, is compared to strong state-of-the-art evaluation baselines, including the original unigram BM25 model, the Markov Random Field model, and the positional language model. According to the experimental results, the window-based N-gram Counting method, and Survival Analysis over an exponential distribution are the most effective among all four proposed methods, which lead to marked improvement over the baselines. This shows that the use of term proximity considerably enhances the retrieval effectiveness of the classical probabilistic models. It is therefore recommended to deploy a term proximity component in retrieval systems that employ probabilistic models.