Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic discovery of language models for text databases
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Comparing the performance of database selection algorithms
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Server selection on the World Wide Web
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Building efficient and effective metasearch engines
ACM Computing Surveys (CSUR)
Modern Information Retrieval
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
Server Ranking for Distributed Text Retrieval Systems on the Internet
Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Comparing the performance of collection selection algorithms
ACM Transactions on Information Systems (TOIS)
Engineering a multi-purpose test collection for web retrieval experiments
Information Processing and Management: an International Journal
Collection selection for managed distributed document databases
Information Processing and Management: an International Journal
When one sample is not enough: improving text database selection using shrinkage
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Classifying and searching hidden-web text databases
Classifying and searching hidden-web text databases
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
The automatic creation of literature abstracts
IBM Journal of Research and Development
Distributed text retrieval from overlapping collections
ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Using query logs to establish vocabularies in distributed information retrieval
Information Processing and Management: an International Journal
Federated text retrieval from uncooperative overlapped collections
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Robust result merging using sample-based score estimates
ACM Transactions on Information Systems (TOIS)
A Topic-Based Measure of Resource Description Quality for Distributed Information Retrieval
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Classification-based resource selection
Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting peer relations for distributed multimedia information retrieval
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Central-rank-based collection selection in uncooperative distributed information retrieval
ECIR'07 Proceedings of the 29th European conference on IR research
Foundations and Trends in Information Retrieval
Adaptive query-based sampling of distributed collections
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
The goal of distributed information retrieval is to support effective searching over multiple document collections. For efficiency, queries should be routed to only those collections that are likely to contain relevant documents, so it is necessary to first obtain information about the content of the target collections. In an uncooperative environment, query probing — where randomly-chosen queries are used to retrieve a sample of the documents and thus of the lexicon — has been proposed as a technique for estimating statistical term distributions. In this paper we rebut the claim that a sample of 300 documents is sufficient to provide good coverage of collection terms. We propose a novel sampling strategy and experimentally demonstrate that sample size needs to vary from collection to collection, that our methods achieve good coverage based on variable-sized samples, and that we can use the results of a probe to determine when to stop sampling.