Adaptive query-based sampling of distributed collections

Authors:
Mark Baillie;Leif Azzopardi;Fabio Crestani
Affiliations:
Department of Computing and Information Sciences, University of Strathclyde, Glasgow, UK;Department of Computing and Information Sciences, University of Strathclyde, Glasgow, UK;Department of Computing and Information Sciences, University of Strathclyde, Glasgow, UK
Venue:
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Year:
2006

Citing 13
Cited 11

Information filtering and information retrieval: two sides of the same coin?

Communications of the ACM - Special issue on information filtering
Cluster-based language models for distributed retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
Investigating the relationship between language model perplexity and IR precision-recall measures

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
When one sample is not enough: improving text database selection using shrinkage

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Modeling search engine effectiveness for federated search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Towards better measures: evaluation of estimated resource description quality for distributed IR

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Sample sizes for query probing in uncooperative distributed information retrieval

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Applications of web query mining

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Federated text retrieval from uncooperative overlapped collections

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Updating collection representations for federated search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Robust result merging using sample-based score estimates

ACM Transactions on Information Systems (TOIS)
A Topic-Based Measure of Resource Description Quality for Distributed Information Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Simple Adaptations of Data Fusion Algorithms for Source Selection

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Server selection methods in personal metasearch: a comparative empirical study

Information Retrieval
Central-rank-based collection selection in uncooperative distributed information retrieval

ECIR'07 Proceedings of the 29th European conference on IR research
The PENG system: integrating push and pull for information access

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Collection-integral source selection for uncooperative distributed information retrieval environments

Information Sciences: an International Journal
A hybrid approach for estimating document frequencies in unstructured P2P networks

Information Systems
Federated Search

Foundations and Trends in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

As part of a Distributed Information Retrieval system a description of each remote information resource, archive or repository is usually stored centrally in order to facilitate resource selection. The acquisition of precise resource descriptions is therefore an important phase in Distributed Information Retrieval, as the quality of such representations will impact on selection accuracy, and ultimately retrieval performance. While Query-Based Sampling is currently used for content discovery of uncooperative resources, the application of this technique is dependent upon heuristic guidelines to determine when a sufficiently accurate representation of each remote resource has been obtained. In this paper we address this shortcoming by using the Predictive Likelihood to provide both an indication of the quality of an acquired resource description estimate, and when a sufficiently good representation of a resource has been obtained during Query-Based Sampling.