Cluster-based language models for distributed retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Pruning long documents for distributed information retrieval
Proceedings of the eleventh international conference on Information and knowledge management
A language modeling framework for resource selection and results merging
Proceedings of the eleventh international conference on Information and knowledge management
Obtaining Language Models of Web Collections Using Query-Based Sampling Techniques
HICSS '02 Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS'02)-Volume 3 - Volume 3
Evaluating different methods of estimating retrieval quality for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
When one sample is not enough: improving text database selection using shrinkage
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Unified utility maximization framework for resource selection
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Modeling search engine effectiveness for federated search
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An evaluation of resource description quality measures
Proceedings of the 2006 ACM symposium on Applied computing
Wildcards for lightweight information integration in virtual desktops
Proceedings of the 17th ACM conference on Information and knowledge management
PISA: A framework for integrating uncooperative peers into P2P-based federated search
Computer Communications
Foundations and Trends in Information Retrieval
Adaptive query-based sampling of distributed collections
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
An open problem for Distributed Information Retrieval systems (DIR) is how to represent large document repositories, also known as resources, both accurately and efficiently. Obtaining resource description estimates is an important phase in DIR, especially in non-cooperative environments. Measuring the quality of an estimated resource description is a contentious issue as current measures do not provide an adequate indication of quality. In this paper, we provide an overview of these currently applied measures of resource description quality, before proposing the Kullback-Leibler (KL) divergence as an alternative. Through experimentation we illustrate the shortcomings of these past measures, whilst providing evidence that KL is a more appropriate measure of quality. When applying KL to compare different QBS algorithms, our experiments provide strong evidence in favour of a previously unsupported hypothesis originally posited in the initial Query-Based Sampling work.