A random polynomial-time algorithm for approximating the volume of convex bodies
Journal of the ACM (JACM)
A technique for measuring the relative size and overlap of public Web search engines
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Accessibility of information on the Web
intelligence
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Approximating Aggregate Queries about Web Pages via Random Walks
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The indexable web is more than 11.5 billion pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Simulated annealing in convex bodies and an O*(n4) volume algorithm
Journal of Computer and System Sciences - Special issue on FOCS 2003
Estimating corpus size via queries
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A random walk approach to sampling hidden databases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Random sampling from a search engine's index
Journal of the ACM (JACM)
Efficient estimation of the size of text deep web data source
Proceedings of the 17th ACM conference on Information and knowledge management
Leveraging COUNT Information in Sampling Hidden Databases
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Improving the evaluation of web search systems
ECIR'03 Proceedings of the 25th European conference on IR research
Unbiased estimation of size and other aggregates over hidden web databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Ranking bias in deep web size estimation using capture recapture method
Data & Knowledge Engineering
Stratified Sampling for Data Mining on the Deep Web
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Hi-index | 0.00 |
Given an unknown set of objects embedded in the Euclidean plane and a nearest-neighbor oracle, how to estimate the set size and other properties of the objects? In this paper we address this problem. We propose an efficient method that uses the Voronoi partitioning of the space by the objects and a nearest-neighbor oracle. Our method can be used in the hidden web/databases context where the goal is to estimate the number of certain objects of interest. Here, we assume that each object has a geographic location and the nearest-neighbor oracle can be realized by applications such as maps, local, or store-locator APIs. We illustrate the performance of our method on several real-world datasets.