Privacy preservation of aggregates in hidden databases: why and how?
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
HDSampler: revealing data behind web form interfaces
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Privacy risks in health databases from aggregate disclosure
Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments
Turbo-charging hidden database samplers with overflowing queries and skew reduction
Proceedings of the 13th International Conference on Extending Database Technology
Unbiased estimation of size and other aggregates over hidden web databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
An access cost-aware approach for object retrieval over multiple sources
Proceedings of the VLDB Endowment
Effective and efficient sampling methods for deep web aggregation queries
Proceedings of the 14th International Conference on Extending Database Technology
Just-in-time analytics on large file systems
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Attribute domain discovery for hidden web databases
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Mining a search engine's corpus: efficient yet unbiased sampling and aggregate estimation
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Sampling hidden objects using nearest-neighbor oracles
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Effective stratification for low selectivity queries on deep web data sources
Proceedings of the 20th ACM international conference on Information and knowledge management
Stratified k-means clustering over a deep web data source
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Interactive pattern mining on hidden data: a sampling-based solution
Proceedings of the 21st ACM international conference on Information and knowledge management
Mining a search engine's corpus without a query pool
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Rank discovery from web databases
Proceedings of the VLDB Endowment
Formal concept analysis approach for data extraction from a limited deep web database
Journal of Intelligent Information Systems
Hi-index | 0.00 |
A large number of online databases are hidden behind form-like interfaces which allow users to execute search queries by specifying selection conditions in the interface. Most of these interfaces return restricted answers (e.g., only top-k of the selected tuples), while many of them also accompany each answer with the COUNT of the selected tuples. In this paper, we propose techniques which leverage the COUNT information to efficiently acquire unbiased samples of the hidden database. We also discuss variants for interfaces which do not provide COUNTinformation. We conduct extensive experiments to illustrate the efficiency and accuracy of our techniques.