Leveraging COUNT Information in Sampling Hidden Databases

Authors:
Arjun Dasgupta;Nan Zhang;Gautam Das
Affiliations:
-;-;-
Venue:
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Year:
2009

Citing 0
Cited 17

Privacy preservation of aggregates in hidden databases: why and how?

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
HDSampler: revealing data behind web form interfaces

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Privacy risks in health databases from aggregate disclosure

Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments
Turbo-charging hidden database samplers with overflowing queries and skew reduction

Proceedings of the 13th International Conference on Extending Database Technology
Unbiased estimation of size and other aggregates over hidden web databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
An access cost-aware approach for object retrieval over multiple sources

Proceedings of the VLDB Endowment
Effective and efficient sampling methods for deep web aggregation queries

Proceedings of the 14th International Conference on Extending Database Technology
Just-in-time analytics on large file systems

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Attribute domain discovery for hidden web databases

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Mining a search engine's corpus: efficient yet unbiased sampling and aggregate estimation

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Sampling hidden objects using nearest-neighbor oracles

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Effective stratification for low selectivity queries on deep web data sources

Proceedings of the 20th ACM international conference on Information and knowledge management
Stratified k-means clustering over a deep web data source

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Interactive pattern mining on hidden data: a sampling-based solution

Proceedings of the 21st ACM international conference on Information and knowledge management
Mining a search engine's corpus without a query pool

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Rank discovery from web databases

Proceedings of the VLDB Endowment
Formal concept analysis approach for data extraction from a limited deep web database

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A large number of online databases are hidden behind form-like interfaces which allow users to execute search queries by specifying selection conditions in the interface. Most of these interfaces return restricted answers (e.g., only top-k of the selected tuples), while many of them also accompany each answer with the COUNT of the selected tuples. In this paper, we propose techniques which leverage the COUNT information to efﬁciently acquire unbiased samples of the hidden database. We also discuss variants for interfaces which do not provide COUNTinformation. We conduct extensive experiments to illustrate the efﬁciency and accuracy of our techniques.