The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
Overview of the sixth text REtrieval conference (TREC-6)
Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Effective site finding using link anchor information
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval
A language modeling framework for resource selection and results merging
Proceedings of the eleventh international conference on Information and knowledge management
Discovering the representative of a search engine
Proceedings of the eleventh international conference on Information and knowledge management
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Comparing the performance of collection selection algorithms
ACM Transactions on Information Systems (TOIS)
Engineering a multi-purpose test collection for web retrieval experiments
Information Processing and Management: an International Journal
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Collection selection for managed distributed document databases
Information Processing and Management: an International Journal
When one sample is not enough: improving text database selection using shrinkage
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Unified utility maximization framework for resource selection
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Sampling search-engine results
WWW '05 Proceedings of the 14th international conference on World Wide Web
Server selection methods in hybrid portal search
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Distributed search over the hidden web: hierarchical database sampling and selection
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Federated text retrieval from uncooperative overlapped collections
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating sampling methods for uncooperative collections
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Updating collection representations for federated search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating collection size with logistic regression
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Information Processing and Management: an International Journal
Generalising multiple capture-recapture to non-uniform sample sizes
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient estimation of the size of text deep web data source
Proceedings of the 17th ACM conference on Information and knowledge management
Integral based source selection for uncooperative distributed information retrieval environments
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Robust result merging using sample-based score estimates
ACM Transactions on Information Systems (TOIS)
Simple Adaptations of Data Fusion Algorithms for Source Selection
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
SUSHI: scoring scaled samples for server selection
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Effective query expansion for federated search
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Server selection methods in personal metasearch: a comparative empirical study
Information Retrieval
Estimating deep web data source size by capture---recapture method
Information Retrieval
Central-rank-based collection selection in uncooperative distributed information retrieval
ECIR'07 Proceedings of the 29th European conference on IR research
Information Sciences: an International Journal
Unbiased estimation of size and other aggregates over hidden web databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Ranking bias in deep web size estimation using capture recapture method
Data & Knowledge Engineering
Modeling information sources as integrals for effective and efficient source selection
Information Processing and Management: an International Journal
Foundations and Trends in Information Retrieval
Attribute domain discovery for hidden web databases
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Database Size Estimation by Query Performance -- A Complexity Aspect
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Size estimation of non-cooperative data collections
Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Vertical selection in the information domain of children
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Rank discovery from web databases
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Modern distributed information retrieval techniques require accurate knowledge of collection size. In non-cooperative environments, where detailed collection statistics are not available, the size of the underlying collections must be estimated. While several approaches for the estimation of collection size have been proposed, their accuracy has not been thoroughly evaluated. An empirical analysis of past estimation approaches across a variety of collections demonstrates that their prediction accuracy is low. Motivated by ecological techniques for the estimation of animal populations, we propose two new approaches for the estimation of collection size. We show that our approaches are significantly more accurate that previous methods, and are more efficient in use of resources required to perform the estimation.