On the propagation of errors in the size of join results
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
STARTS: Stanford proposal for Internet meta-searching
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
The open archives initiative: building a low-barrier interoperability framework
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Automatic Information Discovery from the "Invisible Web"
ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
Downloading textual hidden web content through keyword queries
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Incognito: efficient full-domain K-anonymity
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
\ell -Diversity: Privacy Beyond \kappa -Anonymity
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Communications of the ACM - ACM at sixty: a look back in time
Distributed search over the hidden web: hierarchical database sampling and selection
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Hi-index | 0.00 |
Whereas strategies for discovering content on the surface web are commonplace, similar strategies for the private web are nonexistent. In this paper we first establish a formal framework for advertising the existence of private web resources that subsumes many existing summarization strategies based on succinct statistical summaries (which we call digests). We then investigate the tradeoff between the data owners' desires to minimize disclosure and the searchers' desires to minimize query error, demonstrating that our techniques are superior to k-anonymity. Finally, we show that our techniques for summarization do, in fact, make it possible to discover private web data resources.