The effectiveness of GIOSS for the text database discovery problem
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating database selection techniques: a testbed and experiment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Methods for information server selection
ACM Transactions on Information Systems (TOIS)
Automatic discovery of language models for text databases
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Comparing the performance of database selection algorithms
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic approach to database selection in networked IR
ACM Transactions on Information Systems (TOIS)
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Analysis of a very large web search engine query log
ACM SIGIR Forum
Server selection on the World Wide Web
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
The impact of database selection on distributed searching
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Towards a highly-scalable and effective metasearch engine
Proceedings of the 10th international conference on World Wide Web
Approaches to collection selection and results merging for distributed information retrieval
Proceedings of the tenth international conference on Information and knowledge management
Discovering the representative of a search engine
Proceedings of the tenth international conference on Information and knowledge management
A language modeling framework for resource selection and results merging
Proceedings of the eleventh international conference on Information and knowledge management
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Server Ranking for Distributed Text Retrieval Systems on the Internet
Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
U.S. versus European web searching trends
ACM SIGIR Forum
Evaluating different methods of estimating retrieval quality for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Comparing the performance of collection selection algorithms
ACM Transactions on Information Systems (TOIS)
A semisupervised learning method to merge search engine results
ACM Transactions on Information Systems (TOIS)
Collection selection for managed distributed document databases
Information Processing and Management: an International Journal
Hourly analysis of a very large topically categorized web query log
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Server selection methods in hybrid portal search
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling search engine effectiveness for federated search
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
The FedLemur project: Federated search in the real world
Journal of the American Society for Information Science and Technology
Capturing collection size for distributed non-cooperative retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
AllInOneNews: development and evaluation of a large-scale news metasearch engine
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Evaluating sampling methods for uncooperative collections
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Updating collection representations for federated search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Experiences evaluating personal metasearch
Proceedings of the second international symposium on Information interaction in context
Central-rank-based collection selection in uncooperative distributed information retrieval
ECIR'07 Proceedings of the 29th European conference on IR research
Adaptive query-based sampling of distributed collections
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Information Sciences: an International Journal
Ranking using multiple document types in desktop search
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Foundations and Trends in Information Retrieval
A multi-collection latent topic model for federated search
Information Retrieval
Evaluating server selection for federated search
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Source selection for image retrieval in peer-to-peer networks
FDIA'09 Proceedings of the Third BCS-IRSG conference on Future Directions in Information Access
Federated search in the wild: the combined power of over a hundred search engines
Proceedings of the 21st ACM international conference on Information and knowledge management
Studying the clustering paradox and scalability of search in highly distributed environments
ACM Transactions on Information Systems (TOIS)
Distributed information retrieval and applications
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
SSTD'13 Proceedings of the 13th international conference on Advances in Spatial and Temporal Databases
Hi-index | 0.00 |
Server selection is an important subproblem in distributed information retrieval (DIR) but has commonly been studied with collections of more or less uniform size and with more or less homogeneous content. In contrast, realistic DIR applications may feature much more varied collections. In particular, personal metasearch--a novel application of DIR which includes all of a user's online resources--may involve collections which vary in size by several orders of magnitude, and which have highly varied data. We describe a number of algorithms for server selection, and consider their effectiveness when collections vary widely in size and are represented by imperfect samples. We compare the algorithms on a personal metasearch testbed comprising calendar, email, mailing list and web collections, where collection sizes differ by three orders of magnitude. We then explore the effect of collection size variations using four partitionings of the TREC ad hoc data used in many other DIR experiments. Kullback-Leibler divergence, previously considered poorly effective, performs better than expected in this application; other techniques thought to be effective perform poorly and are not appropriate for this problem. A strong correlation with size-based rankings for many techniques may be responsible.