Modeling score distributions for combining the outputs of search engines

Authors:
R. Manmatha;T. Rath;F. Feng
Affiliations:
Univ. of Massashusetts, Amherst;Univ. of Massashusetts, Amherst;Univ. of Massashusetts, Amherst
Venue:
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2001

Citing 11
Cited 89

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning collection fusion strategies

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Combining multiple evidence from different properties of weighting schemes

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Neural networks for pattern recognition

Neural networks for pattern recognition
Analyses of multiple evidence combination

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Fuzzy queries in multimedia database systems

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Predicting the performance of linearly combined IR systems

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems (poster session)

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Query by Image and Video Content: The QBIC System

Computer

Models for metasearch

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Maximum likelihood estimation for filtering thresholds

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance score normalization for metasearch

Proceedings of the tenth international conference on Information and knowledge management
Using sampled data and regression to merge search engine results

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic query wefinement using lexical affinities with maximal information gain

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A critical examination of TDT's cost function

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments on data fusion using headline information

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A multi-paradigm querying approach for a generic multimedia database management system

ACM SIGMOD Record
Collection fusion using Bayesian estimation of a linear regression model in image databases on the Web

Information Processing and Management: an International Journal - Modelling vagueness and subjectivity in information access
Cross-language information retrieval: experiments based on CLEF 2000 corpora

Information Processing and Management: an International Journal
Using asymmetric distributions to improve text classifier probability estimates

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Combining document representations for known-item search

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Evaluating different methods of estimating retrieval quality for resource selection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A semisupervised learning method to merge search engine results

ACM Transactions on Information Systems (TOIS)
From Retrieval Status Values to Probabilities of Relevance for Advanced IR Applications

Information Retrieval
Web metasearch: rank vs. score based rank aggregation methods

Proceedings of the 2003 ACM symposium on Applied computing
A unified model for metasearch, pooling, and system evaluation

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Measuring retrieval effectiveness: a new proposal and a first experimental validation

Journal of the American Society for Information Science and Technology
Forming test collections with no system pooling

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Fusion of effective retrieval strategies in the same information retrieval system

Journal of the American Society for Information Science and Technology
Language Modeling for Information Retrieval

Journal of Logic, Language and Information
Comparative study of monolingual and multilingual search models for use with asian languages

ACM Transactions on Asian Language Information Processing (TALIP)
On relevance distributions: Brief Communication

Journal of the American Society for Information Science and Technology
ProbFuse: a probabilistic approach to data fusion

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent query analysis for combining multiple retrieval sources

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Using historical data to enhance rank aggregation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Using score distributions for query-time fusion in multimediaretrieval

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Improving high accuracy retrieval by eliminating the uneven correlation effect in data fusion

Journal of the American Society for Information Science and Technology
Voting for candidates: adapting data fusion techniques for an expert search task

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Supervised rank aggregation

Proceedings of the 16th international conference on World Wide Web
Result merging methods in distributed information retrieval with overlapping databases

Information Retrieval
Automatic query-time generation of retrieval expert coefficients for multimedia retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Web search from a bus

Proceedings of the second ACM workshop on Challenged networks
A formal approach to score normalization for meta-search

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Cross-domain video concept detection using adaptive svms

Proceedings of the 15th international conference on Multimedia
Probabilistic data fusion on a large document collection

Artificial Intelligence Review
Automatic document prior feature selection for web retrieval

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Voting techniques for expert search

Knowledge and Information Systems
Enhancing interactive web applications in hybrid networks

Proceedings of the 14th ACM international conference on Mobile computing and networking
Combining similarity measures in content-based image retrieval

Pattern Recognition Letters
To swing or not to swing: learning when (not) to advertise

Proceedings of the 17th ACM conference on Information and knowledge management
Robust result merging using sample-based score estimates

ACM Transactions on Information Systems (TOIS)
Selective Application of Query-Independent Features in Web Information Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Assigning appropriate weights for the linear combination data fusion method in information retrieval

Information Processing and Management: an International Journal
Generative model-based metasearch for data fusion in information retrieval

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Combining LVCSR and vocabulary-independent ranked utterance retrieval for robust speech search

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Combining audio content and social context for semantic music discovery

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Where to stop reading a ranked list?: threshold optimization using truncated score distributions

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Score Distributions in Information Retrieval

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Modeling the Score Distributions of Relevant and Non-relevant Documents

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Ranking List Dispersion as a Query Performance Predictor

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Score distribution based term specific thresholding for spoken term detection

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
A signal-to-noise approach to score normalization

Proceedings of the 18th ACM conference on Information and knowledge management
An adaptable image retrieval system with relevance feedback using kernel machines and selective sampling

IEEE Transactions on Image Processing
Probabilistic static pruning of inverted files

ACM Transactions on Information Systems (TOIS)
Score normalization in multimodal biometric systems

Pattern Recognition
On score distributions and relevance

ECIR'07 Proceedings of the 29th European conference on IR research
Central-rank-based collection selection in uncooperative distributed information retrieval

ECIR'07 Proceedings of the 29th European conference on IR research
Segmentation of search engine results for effective data-fusion

ECIR'07 Proceedings of the 29th European conference on IR research
Automatic construction of an opinion-term vocabulary for ad hoc retrieval

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Score distribution models: assumptions, intuition, and robustness to score manipulation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Estimating probabilities for effective data fusion

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Result-size estimation for information-retrieval subqueries

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Retrieval result presentation and evaluation

KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
Modeling score distributions in information retrieval

Information Retrieval
Variational bayes for modeling score distributions

Information Retrieval
The static absorbing model for the web

Journal of Web Engineering
The linear combination data fusion method in information retrieval

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Applying the data fusion technique to blog opinion retrieval

Expert Systems with Applications: An International Journal
Linear combination of component results in information retrieval

Data & Knowledge Engineering
Adapting document ranking to users’ preferences using click-through data

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Learning to select a ranking function

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Ranking fusion methods applied to on-line handwriting information retrieval

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Data fusion with correlation weights

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Using score distribution models to select the kernel type for a web-based adaptive image retrieval system (AIRS)

CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval
Predicting Query Performance by Query-Drift Estimation

ACM Transactions on Information Systems (TOIS)
Weighted consensus multi-document summarization

Information Processing and Management: an International Journal
Probabilistic score normalization for rank aggregation

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
On modeling rank-independent risk in estimating probability of relevance

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Measuring the ability of score distributions to model relevance

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Query performance prediction based on ranking list dispersion

FDIA'09 Proceedings of the Third BCS-IRSG conference on Future Directions in Information Access
Score transformation in linear combination for multi-criteria relevance ranking

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Extended expectation maximization for inferring score distributions

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Rhetorical relations for information retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
On the inference of average precision from score distributions

Proceedings of the 21st ACM international conference on Information and knowledge management
Studying the clustering paradox and scalability of search in highly distributed environments

ACM Transactions on Information Systems (TOIS)
Copulas for information retrieval

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Modelling Score Distributions Without Actual Scores

Proceedings of the 2013 Conference on the Theory of Information Retrieval
Document Score Distribution Models for Query Performance Inference and Prediction

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper the score distributions of a number of text search engines are modeled. It is shown empirically that the score distributions on a per query basis may be fitted using an exponential distribution for the set of non-relevant documents and a normal distribution for the set of relevant documents. Experiments show that this model fits TREC-3 and TREC-4 data for not only probabilistic search engines like INQUERY but also vector space search engines like SMART for English. We have also used this model to fit the output of other search engines like LSI search engines and search engines indexing other languages like Chinese.It is then shown that given a query for which relevance information is not available, a mixture model consisting of an exponential and a normal distribution can be fitted to the score distribution. These distributions can be used to map the scores of a search engine to probabilities. We also discuss how the shape of the score distributions arise given certain assumptions about word distributions in documents. We hypothesize that all 'good' text search engines operating on any language have similar characteristics.This model has many possible applications. For example, the outputs of different search engines can be combined by averaging the probabilities (optimal if the search engines are independent) or by using the probabilities to select the best engine for each query. Results show that the technique performs as well as the best current combination techniques.