Federated search in the wild: the combined power of over a hundred search engines

Authors:
Dong Nguyen;Thomas Demeester;Dolf Trieschnigg;Djoerd Hiemstra
Affiliations:
University of Twente, Enschede, Netherlands;Ghent University, Ghent, Belgium;University of Twente, Enschede, Netherlands;University of Twente, Enschede, Netherlands
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 15
Cited 6

Server selection on the World Wide Web

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Variations in relevance judgments and the measurement of retrieval effectiveness

Information Processing and Management: an International Journal
Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
Obtaining Language Models of Web Collections Using Query-Based Sampling Techniques

HICSS '02 Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS'02)-Volume 3 - Volume 3
Relevant document distribution estimation method for resource selection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Comparing the performance of collection selection algorithms

ACM Transactions on Information Systems (TOIS)
Server selection methods in hybrid portal search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A picture of search

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Federated text retrieval from uncooperative overlapped collections

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Distributed search over the hidden web: hierarchical database sampling and selection

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Sources of evidence for vertical selection

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Server selection methods in personal metasearch: a comparative empirical study

Information Retrieval
Federated Search

Foundations and Trends in Information Retrieval
Evaluating large-scale distributed vertical search

Proceedings of the 9th workshop on Large-scale and distributed informational retrieval

Snippet-Based relevance predictions for federated web search

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Distributed information retrieval and applications

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Search result diversification in resource selection for federated search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
On the reliability and intuitiveness of aggregated search metrics

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Exploiting user disagreement for web search evaluation: an experimental approach

Proceedings of the 7th ACM international conference on Web search and data mining
Composite retrieval of heterogeneous web search

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Federated search has the potential of improving web search: the user becomes less dependent on a single search provider and parts of the deep web become available through a unified interface, leading to a wider variety in the retrieved search results. However, a publicly available dataset for federated search reflecting an actual web environment has been absent. As a result, it has been difficult to assess whether proposed systems are suitable for the web setting. We introduce a new test collection containing the results from more than a hundred actual search engines, ranging from large general web search engines such as Google and Bing to small domain-specific engines. We discuss the design and analyze the effect of several sampling methods. For a set of test queries, we collected relevance judgements for the top 10 results of each search engine. The dataset is publicly available and is useful for researchers interested in resource selection for web search collections, result merging and size estimation of uncooperative resources.