Using sampled data and regression to merge search engine results

Authors:
Luo Si;Jamie Callan
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2002

Citing 18
Cited 33

TREC and TIPSTER experiments with INQUERY

TREC-2 Proceedings of the second conference on Text retrieval conference
Dissemination of collection wide information in a distributed information retrieval system

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning collection fusion strategies

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
STARTS: Stanford proposal for Internet meta-searching

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Analyses of multiple evidence combination

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Effective retrieval with distributed collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Comparing the performance of database selection algorithms

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic approach to database selection in networked IR

ACM Transactions on Information Systems (TOIS)
Server selection on the World Wide Web

DL '00 Proceedings of the fifth ACM conference on Digital libraries
The impact of database selection on distributed searching

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Database merging strategy based on logistic regression

Information Processing and Management: an International Journal
Collection selection and results merging with topically organized U.S. patents and TREC data

Proceedings of the ninth international conference on Information and knowledge management
Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
Modeling score distributions for combining the outputs of search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Models for metasearch

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Server Ranking for Distributed Text Retrieval Systems on the Internet

Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)

Pruning long documents for distributed information retrieval

Proceedings of the eleventh international conference on Information and knowledge management
A language modeling framework for resource selection and results merging

Proceedings of the eleventh international conference on Information and knowledge management
Relevant document distribution estimation method for resource selection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A semisupervised learning method to merge search engine results

ACM Transactions on Information Systems (TOIS)
Shadow document methods of resutls merging

Proceedings of the 2004 ACM symposium on Applied computing
Ranked Relations: Query Languages and Query Processing Methods for Multimedia

Multimedia Tools and Applications
Merging Results for Distributed Content Based Image Retrieval

Multimedia Tools and Applications
Unified utility maximization framework for resource selection

Proceedings of the thirteenth ACM international conference on Information and knowledge management
A utility theoretic approach to determining optimal wait times in distributed information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Distributed information retrieval with skewed database size distributions

dg.o '03 Proceedings of the 2003 annual national conference on Digital government research
Reducing storage costs for federated search of text databases

dg.o '03 Proceedings of the 2003 annual national conference on Digital government research
Collaborative research - digital government: a language modeling approach to metadata for cross-database linkage and search

dg.o '04 Proceedings of the 2004 annual national conference on Digital government research
ProbFuse: a probabilistic approach to data fusion

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Distributed query sampling: a quality-conscious approach

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Testing the cluster hypothesis in distributed information retrieval

Information Processing and Management: an International Journal
Query performance prediction

Information Systems
Result merging methods in distributed information retrieval with overlapping databases

Information Retrieval
An outranking approach for rank aggregation in information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Probability-based fusion of information retrieval result sets

Artificial Intelligence Review
Robust result merging using sample-based score estimates

ACM Transactions on Information Systems (TOIS)
Collection profiling for collection fusion in distributed information retrieval systems

KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Extending probabilistic data fusion using sliding windows

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval
Using missing documents in metasearch aggregation: an application of OWA operator

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Federated Search

Foundations and Trends in Information Retrieval
Query efficiency prediction for dynamic pruning

Proceedings of the 9th workshop on Large-scale and distributed informational retrieval
A formal approach to evaluate and compare internet search engines: a case study on searching the chinese web

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Usercentric Operational Decision Making in Distributed Information Retrieval

Information Systems Research
Evaluation of result merging strategies for metasearch engines

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Learning to predict response times for online query scheduling

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Aggregating evidence from hospital departments to improve medical records search

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Reducing the uncertainty in resource selection

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Distributed information retrieval and applications

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper addresses the problem of merging results obtained from different databases and search engines in a distributed information retrieval environment. The prior research on this problem either assumed the exchange of statistics necessary for normalizing scores (cooperative solutions) or is heuristic. Both approaches have disadvantages. We show that the problem in uncooperative environments is simpler when viewed as a component of a distributed IR system that uses query-based sampling to create resource descriptions. Documents sampled for creating resource descriptions can also be used to create a sample centralized index, and this index is a source of training data for adaptive results merging algorithms. A variety of experiments demonstrate that this new approach is more effective than a well-known alternative, and that it allows query-by-query tuning of the results merging function.