Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A technique for measuring the relative size and overlap of public Web search engines
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Cluster-based language models for distributed retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The impact of database selection on distributed searching
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Discovering the representative of a search engine
Proceedings of the tenth international conference on Information and knowledge management
Approximating Aggregate Queries about Web Pages via Random Walks
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The indexable web is more than 11.5 billion pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Random sampling from a search engine's index
Proceedings of the 15th international conference on World Wide Web
Capturing collection size for distributed non-cooperative retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating corpus size via queries
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Generalising multiple capture-recapture to non-uniform sample sizes
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Integral based source selection for uncooperative distributed information retrieval environments
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Robust result merging using sample-based score estimates
ACM Transactions on Information Systems (TOIS)
A Topic-Based Measure of Resource Description Quality for Distributed Information Retrieval
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Simple Adaptations of Data Fusion Algorithms for Source Selection
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Server selection methods in personal metasearch: a comparative empirical study
Information Retrieval
PISA: Federated Search in P2P Networks with Uncooperative Peers
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Exploiting peer relations for distributed multimedia information retrieval
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Estimating deep web data source size by capture---recapture method
Information Retrieval
Information Sciences: an International Journal
Ranking bias in deep web size estimation using capture recapture method
Data & Knowledge Engineering
Modeling information sources as integrals for effective and efficient source selection
Information Processing and Management: an International Journal
Foundations and Trends in Information Retrieval
To what problem is distributed information retrieval the solution?
Journal of the American Society for Information Science and Technology
Vertical selection in the information domain of children
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Merging algorithms for enterprise search
Proceedings of the 18th Australasian Document Computing Symposium
Hi-index | 0.00 |
Many server selection methods suitable for distributed information retrieval applications rely, in the absence of cooperation, on the availability of unbiased samples of documents from the constituent collections. We describe a number of sampling methods which depend only on the normal query-response mechanism of the applicable search facilities. We evaluate these methods on a number of collections typical of a personal metasearch application. Results demonstrate that biases exist for all methods, particularly toward longer documents, and that in some cases these biases can be reduced but not eliminated by choice of parameters.We also introduce a new sampling technique, "multiple queries", which produces samples of similar quality to the best current techniques but with significantly reduced cost.