TREC and TIPSTER experiments with INQUERY
TREC-2 Proceedings of the second conference on Text retrieval conference
Comparing the performance of database selection algorithms
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Server selection on the World Wide Web
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Using sampled data and regression to merge search engine results
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Does pseudo-relevance feedback improve distributed information retrieval systems?
Information Processing and Management: an International Journal
A Task-Based Evaluation of an Aggregated Search Interface
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Hi-index | 0.00 |
The proliferation of government information on local area networks and the Internet creates the problem of finding information that may be distributed among many disjoint text databases (distributed information retrieval or federated search). A distributed information retrieval system is composed of three components: Resource representation, resource selection and result merging. Previous research suggested that the CORI algorithm is one of the most effective resource selection algorithms, but its effectiveness in environments containing a wide range of database sizes was not studied thoroughly. This paper shows that the CORI algorithm does not work well in environments with a skewed distribution of database sizes. We present a new resource selection algorithm based on estimating the distribution of relevant documents among the online databases. This new algorithm selects resources more accurately than the CORI algorithm, which can lead to improved document rankings.