A formal approach to score normalization for meta-search

Authors:
R. Manmatha;H. Sever
Affiliations:
University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA
Venue:
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Year:
2002

Citing 11
Cited 15

Combining multiple evidence from different properties of weighting schemes

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Analyses of multiple evidence combination

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Predicting the performance of linearly combined IR systems

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems (poster session)

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Modeling score distributions for combining the outputs of search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Models for metasearch

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The score-distributional threshold optimization for adaptive binary classification tasks

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Maximum likelihood estimation for filtering thresholds

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance score normalization for metasearch

Proceedings of the tenth international conference on Information and knowledge management
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Topic detection and tracking evaluation overview

Topic detection and tracking

Comparison of Normalization Techniques for Metasearch

ADVIS '02 Proceedings of the Second International Conference on Advances in Information Systems
Confidence estimation for translation prediction

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
On relevance distributions: Brief Communication

Journal of the American Society for Information Science and Technology
Proximity within paragraph: a measure to enhance document retrieval performance

Proceedings of the 15th international conference on World Wide Web
Learning a ranking from pairwise preferences

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Confidence estimation for NLP applications

ACM Transactions on Speech and Language Processing (TSLP)
Supervised rank aggregation

Proceedings of the 16th international conference on World Wide Web
Robust test collections for retrieval evaluation

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Semantic concept-based query expansion and re-ranking for multimedia retrieval

Proceedings of the 15th international conference on Multimedia
SUSHI: scoring scaled samples for server selection

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Combining evidence with a probabilistic framework for answer ranking and answer merging in question answering

Information Processing and Management: an International Journal
Quality and Leniency in Online Collaborative Rating Systems

ACM Transactions on the Web (TWEB)
Score transformation in linear combination for multi-criteria relevance ranking

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Unsupervised linear score normalization revisited

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Merging algorithms for enterprise search

Proceedings of the 18th Australasian Document Computing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Meta-search, or the combination of the outputs of different search engines in response to a query, has been shown to improve performance. Since the scores produced by different search engines are not comparable, researchers have often decomposed the meta-search problem into a score normalization step followed by a combination step. Combination has been studied by many researchers. While appropriate normalization can affect performance, most of the normalization schemes suggested are ad hoc in nature. In this paper, we propose a formal approach to normalizing scores for meta-search by taking the distributions of the scores into account. Recently, it has been shown that for search engines the score distributions for a given query may be modeled using an exponential distribution for the set of non-relevant documents and a normal distribution for the set of relevant documents. Here, it is shown that by equalizing the distributions of scores of the top non-relevant documents the best meta-search performance reported in the literature is obtained. Since relevance information is not available apri-ori, we discuss two different ways of obtaining a good approximation to the distribution of scores of non-relevant documents. One is obtained by looking at the distribution of scores of all documents. The second is obtained by fitting a mixture model of an exponential and a Gaussian to the scores of all documents and using the resulting exponential distribution as an estimate of the non-relevant distribution. We show with experiments on TREC-3, TREC-4 and TREC-9 data that the best combination results are obtained by averaging the parameters obtained from these approximations. These techniques work on a variety of different search engines including vector space search engines like SMART and probabilistic search engines like INQUERY. The problem of normalization is important in many other areas including information filtering, topic detection and tracking, multilingual search and distributed retrieval. Thus, the techniques proposed here are likely to be applicable to many of these tasks.