Comparison of Normalization Techniques for Metasearch

Authors:
Hayri Sever;Mehmet R. Tolun
Affiliations:
-;-
Venue:
ADVIS '02 Proceedings of the Second International Conference on Advances in Information Systems
Year:
2002

Citing 14
Cited 1

A retrieval model incorporating hypertext links

HYPERTEXT '89 Proceedings of the second annual ACM conference on Hypertext
The effect multiple query representations on information retrieval system performance

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Combining the evidence of multiple query representations for information retrieval

TREC-2 Proceedings of the second conference on Text retrieval conference
Combining automatic and manual index representations in probabilistic retrieval

Journal of the American Society for Information Science
Combining multiple evidence from different properties of weighting schemes

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Analyses of multiple evidence combination

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Extended Boolean information retrieval

Communications of the ACM
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
Maximum likelihood estimation for filtering thresholds

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance score normalization for metasearch

Proceedings of the tenth international conference on Information and knowledge management
Fusion Via a Linear Combination of Scores

Information Retrieval
Translation Resources, Merging Strategies, and Relevance Feedback for Cross-Language Information Retrieval

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Topic detection and tracking evaluation overview

Topic detection and tracking
A formal approach to score normalization for meta-search

HLT '02 Proceedings of the second international conference on Human Language Technology Research

Score transformation in linear combination for multi-criteria relevance ranking

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is well-known fact that the combination of the retrieval outputs of different search systems in response to a query, known as metasearch, improves performance on average, provided that these combined systems (1) have compatible outputs, (2) produce accurate probability of relevance estimates of documents, and (3) be independent of each other. The objective of a normalization technique is to target the first requirement, i.e., document scores of different retrieval outputs are brought into a common scale so that document scores can be comparable across combined retrieval outputs. This has been a recent subject of researches in metasearch and information filtering fields. In this paper, we present a different perspective on multiple evidence combination and investigate various normalization techniques, mostly ad-hoc in nature, with a special focus on the SUM, which shifts minimum scores to zero and then scales their summation to one. This formal approach is equivalent to normalize the distribution of scores of all documents in a retrieval output by dividing them by their sample mean. We have made extensive experiments using ad hoc tracks of third and fifth TREC collections and CLEF'00 database. We argue that (1) the normalization method SUM is consistently better than the other traditionally proposed ones when combining outputs of search systems operating on a single database. (2) the SUM for combination of outputs of search systems operating on mutually exclusive databases is still valuable alternative to the one weighting score distributions of documents by their databases' size.