A signal-to-noise approach to score normalization

Authors:
Avi Arampatzis;Jaap Kamps
Affiliations:
University of Amsterdam, Amsterdam, Netherlands;University of Amsterdam, Amsterdam, Netherlands
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 20
Cited 14

Combining multiple evidence from different properties of weighting schemes

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating and optimizing autonomous text classification systems

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Analyses of multiple evidence combination

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
Modeling score distributions for combining the outputs of search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The score-distributional threshold optimization for adaptive binary classification tasks

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Maximum likelihood estimation for filtering thresholds

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
On Collection Size and Retrieval Effectiveness

Information Retrieval
Extension of Zipf's law to words and phrases

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Using machine learning techniques to interpret WH-questions

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Using historical data to enhance rank aggregation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Building simulated queries for known-item topics: an analysis using six european languages

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A study of query length

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Where to stop reading a ranked list?: threshold optimization using truncated score distributions

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
From uncertain inference to probability of relevance for advanced IR applications

ECIR'03 Proceedings of the 25th European conference on IR research
On score distributions and relevance

ECIR'07 Proceedings of the 29th European conference on IR research
All of Statistics: A Concise Course in Statistical Inference

All of Statistics: A Concise Course in Statistical Inference
Combination methods for crosslingual web retrieval

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Probabilistic score normalization for rank aggregation

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Late fusion of compact composite descriptors for retrieval from heterogeneous image databases

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
www.MMRetrieval.net: a multimodal search engine

Proceedings of the Third International Conference on SImilarity Search and APplications
Entity ranking using Wikipedia as a pivot

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Modeling score distributions in information retrieval

Information Retrieval
Modeling document scores for distributed information retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Applying the data fusion technique to blog opinion retrieval

Expert Systems with Applications: An International Journal
Linear combination of component results in information retrieval

Data & Knowledge Engineering
Quality and Leniency in Online Collaborative Rating Systems

ACM Transactions on the Web (TWEB)
An empirical study of query specificity

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Score transformation in linear combination for multi-criteria relevance ranking

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Extended expectation maximization for inferring score distributions

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Unsupervised linear score normalization revisited

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Document Score Distribution Models for Query Performance Inference and Prediction

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Score normalization is indispensable in distributed retrieval and fusion or meta-search where merging of result-lists is required. Distributional approaches to score normalization with reference to relevance, such as binary mixture models like the normal-exponential, suffer from lack of universality and troublesome parameter estimation especially under sparse relevance. We develop a new approach which tackles both problems by using aggregate score distributions without reference to relevance, and is suitable for uncooperative engines. The method is based on the assumption that scores produced by engines consist of a signal and a noise component which can both be approximated by submitting well-defined sets of artificial queries to each engine. We evaluate in a standard distributed retrieval testbed and show that the signal-to-noise approach yields better results than other distributional methods. As a significant by-product, we investigate query-length distributions.