Towards better measures: evaluation of estimated resource description quality for distributed IR

Authors:
Mark Baillie;Leif Azzopardi;Fabio Crestani
Affiliations:
University of Strathclyde, Glasgow, United Kingdom;University of Strathclyde, Glasgow, United Kingdom;University of Strathclyde, Glasgow, United Kingdom
Venue:
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Year:
2006

Citing 11
Cited 4

Cluster-based language models for distributed retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
Pruning long documents for distributed information retrieval

Proceedings of the eleventh international conference on Information and knowledge management
A language modeling framework for resource selection and results merging

Proceedings of the eleventh international conference on Information and knowledge management
Obtaining Language Models of Web Collections Using Query-Based Sampling Techniques

HICSS '02 Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS'02)-Volume 3 - Volume 3
Evaluating different methods of estimating retrieval quality for resource selection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
When one sample is not enough: improving text database selection using shrinkage

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Unified utility maximization framework for resource selection

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Modeling search engine effectiveness for federated search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An evaluation of resource description quality measures

Proceedings of the 2006 ACM symposium on Applied computing

Wildcards for lightweight information integration in virtual desktops

Proceedings of the 17th ACM conference on Information and knowledge management
PISA: A framework for integrating uncooperative peers into P2P-based federated search

Computer Communications
Federated Search

Foundations and Trends in Information Retrieval
Adaptive query-based sampling of distributed collections

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

An open problem for Distributed Information Retrieval systems (DIR) is how to represent large document repositories, also known as resources, both accurately and efficiently. Obtaining resource description estimates is an important phase in DIR, especially in non-cooperative environments. Measuring the quality of an estimated resource description is a contentious issue as current measures do not provide an adequate indication of quality. In this paper, we provide an overview of these currently applied measures of resource description quality, before proposing the Kullback-Leibler (KL) divergence as an alternative. Through experimentation we illustrate the shortcomings of these past measures, whilst providing evidence that KL is a more appropriate measure of quality. When applying KL to compare different QBS algorithms, our experiments provide strong evidence in favour of a previously unsupported hypothesis originally posited in the initial Query-Based Sampling work.