Server selection methods in personal metasearch: a comparative empirical study

Authors:
Paul Thomas;David Hawking
Affiliations:
CSIRO ICT Centre, Canberra, Australia 2601;Funnelback Pty Ltd, Dickson, Australia 2601
Venue:
Information Retrieval
Year:
2009

Citing 36
Cited 10

The effectiveness of GIOSS for the text database discovery problem

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating database selection techniques: a testbed and experiment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Methods for information server selection

ACM Transactions on Information Systems (TOIS)
Automatic discovery of language models for text databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Comparing the performance of database selection algorithms

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic approach to database selection in networked IR

ACM Transactions on Information Systems (TOIS)
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Analysis of a very large web search engine query log

ACM SIGIR Forum
Server selection on the World Wide Web

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
The impact of database selection on distributed searching

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Towards a highly-scalable and effective metasearch engine

Proceedings of the 10th international conference on World Wide Web
Approaches to collection selection and results merging for distributed information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Discovering the representative of a search engine

Proceedings of the tenth international conference on Information and knowledge management
A language modeling framework for resource selection and results merging

Proceedings of the eleventh international conference on Information and knowledge management
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Server Ranking for Distributed Text Retrieval Systems on the Internet

Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
U.S. versus European web searching trends

ACM SIGIR Forum
Evaluating different methods of estimating retrieval quality for resource selection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Relevant document distribution estimation method for resource selection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Comparing the performance of collection selection algorithms

ACM Transactions on Information Systems (TOIS)
A semisupervised learning method to merge search engine results

ACM Transactions on Information Systems (TOIS)
Collection selection for managed distributed document databases

Information Processing and Management: an International Journal
Hourly analysis of a very large topically categorized web query log

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Server selection methods in hybrid portal search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling search engine effectiveness for federated search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
The FedLemur project: Federated search in the real world

Journal of the American Society for Information Science and Technology
Capturing collection size for distributed non-cooperative retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
AllInOneNews: development and evaluation of a large-scale news metasearch engine

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Evaluating sampling methods for uncooperative collections

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Updating collection representations for federated search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Experiences evaluating personal metasearch

Proceedings of the second international symposium on Information interaction in context
Central-rank-based collection selection in uncooperative distributed information retrieval

ECIR'07 Proceedings of the 29th European conference on IR research
Adaptive query-based sampling of distributed collections

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval

Collection-integral source selection for uncooperative distributed information retrieval environments

Information Sciences: an International Journal
Ranking using multiple document types in desktop search

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Federated Search

Foundations and Trends in Information Retrieval
A multi-collection latent topic model for federated search

Information Retrieval
Evaluating server selection for federated search

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Source selection for image retrieval in peer-to-peer networks

FDIA'09 Proceedings of the Third BCS-IRSG conference on Future Directions in Information Access
Federated search in the wild: the combined power of over a hundred search engines

Proceedings of the 21st ACM international conference on Information and knowledge management
Studying the clustering paradox and scalability of search in highly distributed environments

ACM Transactions on Information Systems (TOIS)
Distributed information retrieval and applications

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Using hybrid techniques for resource description and selection in the context of distributed geographic information retrieval

SSTD'13 Proceedings of the 13th international conference on Advances in Spatial and Temporal Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Server selection is an important subproblem in distributed information retrieval (DIR) but has commonly been studied with collections of more or less uniform size and with more or less homogeneous content. In contrast, realistic DIR applications may feature much more varied collections. In particular, personal metasearch--a novel application of DIR which includes all of a user's online resources--may involve collections which vary in size by several orders of magnitude, and which have highly varied data. We describe a number of algorithms for server selection, and consider their effectiveness when collections vary widely in size and are represented by imperfect samples. We compare the algorithms on a personal metasearch testbed comprising calendar, email, mailing list and web collections, where collection sizes differ by three orders of magnitude. We then explore the effect of collection size variations using four partitionings of the TREC ad hoc data used in many other DIR experiments. Kullback-Leibler divergence, previously considered poorly effective, performs better than expected in this application; other techniques thought to be effective perform poorly and are not appropriate for this problem. A strong correlation with size-based rankings for many techniques may be responsible.