Combining multiple evidence from different properties of weighting schemes
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Analyses of multiple evidence combination
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Predicting the performance of linearly combined IR systems
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Modeling score distributions for combining the outputs of search engines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The score-distributional threshold optimization for adaptive binary classification tasks
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Maximum likelihood estimation for filtering thresholds
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance score normalization for metasearch
Proceedings of the tenth international conference on Information and knowledge management
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Topic detection and tracking evaluation overview
Topic detection and tracking
Comparison of Normalization Techniques for Metasearch
ADVIS '02 Proceedings of the Second International Conference on Advances in Information Systems
Confidence estimation for translation prediction
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
On relevance distributions: Brief Communication
Journal of the American Society for Information Science and Technology
Proximity within paragraph: a measure to enhance document retrieval performance
Proceedings of the 15th international conference on World Wide Web
Learning a ranking from pairwise preferences
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Confidence estimation for NLP applications
ACM Transactions on Speech and Language Processing (TSLP)
Proceedings of the 16th international conference on World Wide Web
Robust test collections for retrieval evaluation
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Semantic concept-based query expansion and re-ranking for multimedia retrieval
Proceedings of the 15th international conference on Multimedia
SUSHI: scoring scaled samples for server selection
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Information Processing and Management: an International Journal
Quality and Leniency in Online Collaborative Rating Systems
ACM Transactions on the Web (TWEB)
Score transformation in linear combination for multi-criteria relevance ranking
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Unsupervised linear score normalization revisited
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Merging algorithms for enterprise search
Proceedings of the 18th Australasian Document Computing Symposium
Hi-index | 0.00 |
Meta-search, or the combination of the outputs of different search engines in response to a query, has been shown to improve performance. Since the scores produced by different search engines are not comparable, researchers have often decomposed the meta-search problem into a score normalization step followed by a combination step. Combination has been studied by many researchers. While appropriate normalization can affect performance, most of the normalization schemes suggested are ad hoc in nature. In this paper, we propose a formal approach to normalizing scores for meta-search by taking the distributions of the scores into account. Recently, it has been shown that for search engines the score distributions for a given query may be modeled using an exponential distribution for the set of non-relevant documents and a normal distribution for the set of relevant documents. Here, it is shown that by equalizing the distributions of scores of the top non-relevant documents the best meta-search performance reported in the literature is obtained. Since relevance information is not available apri-ori, we discuss two different ways of obtaining a good approximation to the distribution of scores of non-relevant documents. One is obtained by looking at the distribution of scores of all documents. The second is obtained by fitting a mixture model of an exponential and a Gaussian to the scores of all documents and using the resulting exponential distribution as an estimate of the non-relevant distribution. We show with experiments on TREC-3, TREC-4 and TREC-9 data that the best combination results are obtained by averaging the parameters obtained from these approximations. These techniques work on a variety of different search engines including vector space search engines like SMART and probabilistic search engines like INQUERY. The problem of normalization is important in many other areas including information filtering, topic detection and tracking, multilingual search and distributed retrieval. Thus, the techniques proposed here are likely to be applicable to many of these tasks.