Extending BM25 with multiple query operators

Authors:
Roi Blanco;Paolo Boldi
Affiliations:
Yahoo! Research, Barcelona, Spain;Università degli Studi, Milano, Italy
Venue:
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Year:
2012

Citing 30
Cited 1

Some inconsistencies and misnomers in probabilistic information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Latent dirichlet allocation

The Journal of Machine Learning Research
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Simple BM25 extension to multiple weighted fields

Proceedings of the thirteenth ACM international conference on Information and knowledge management
On Event Spaces and Probabilistic Models in Information Retrieval

Information Retrieval
Relevance weighting for query independent evidence

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Markov logic networks

Machine Learning
Learning user interaction models for predicting web search result preferences

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Term proximity scoring for ad-hoc retrieval on very large text collections

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
SoftRank: optimizing non-smooth rank metrics

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Unsupervised query segmentation using generative language models and wikipedia

Proceedings of the 17th international conference on World Wide Web
Positional language models for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Two-stage query segmentation for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Unifying logical and statistical AI

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Learning to Rank for Information Retrieval

Foundations and Trends in Information Retrieval
Improving web search relevance with semantic features

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
The Probabilistic Relevance Framework: BM25 and Beyond

Foundations and Trends in Information Retrieval
Learning concept importance using a weighted dependence model

Proceedings of the third ACM international conference on Web search and data mining
Learning to rank with (a lot of) word features

Information Retrieval
How good is a span of terms?: exploiting proximity to improve web retrieval

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Positional relevance model for pseudo-relevance feedback

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
The power of naive query segmentation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Query intent prediction and recommendation

Proceedings of the fourth ACM conference on Recommender systems
Unsupervised query segmentation using clickthrough for information retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
An analysis of web proxy logs with query distribution pattern approach for search engines

Computer Standards & Interfaces
A log-logistic model-based interpretation of TF normalization of BM25

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval

Web usage mining with semantic analysis

Proceedings of the 22nd international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional probabilistic relevance frameworks for informational retrieval refrain from taking positional information into account, due to the hurdles of developing a sound model while avoiding an explosion in the number of parameters. Nonetheless, the well-known BM25F extension of the successful Okapi ranking function can be seen as an embryonic attempt in that direction. In this paper, we proceed along the same line, defining the notion of virtual region: a virtual region is a part of the document that, like a BM25F-field, can provide a (larger or smaller, depending on a tunable weighting parameter) evidence of relevance of the document; differently from BM25F fields, though, virtual regions are generated implicitly by applying suitable (usually, but not necessarily, positional-aware) operators to the query. This technique fits nicely in the eliteness model behind BM25 and provides a principled explanation to BM25F; it specializes to BM25(F) for some trivial operators, but has a much more general appeal. Our experiments (both on standard collections, such as TREC, and on Web-like repertoires) show that the use of virtual regions is beneficial for retrieval effectiveness.