Facilitating discovery on the private web using dataset digests

Authors:
Peter Mork;Ken Smith;Barbara Blaustein;Christopher Wolf;Ken Samuel;Keri Sarver;Irina Vayndiner
Affiliations:
The MITRE Corporation, 7515 Colshire Dr, McLean, VA 22102, USA.;The MITRE Corporation, 7515 Colshire Dr, McLean, VA 22102, USA.;The MITRE Corporation, 7515 Colshire Dr, McLean, VA 22102, USA.;The MITRE Corporation, 7515 Colshire Dr, McLean, VA 22102, USA.;The MITRE Corporation, 7515 Colshire Dr, McLean, VA 22102, USA.;The MITRE Corporation, 7515 Colshire Dr, McLean, VA 22102, USA.;The MITRE Corporation, 7515 Colshire Dr, McLean, VA 22102, USA
Venue:
International Journal of Metadata, Semantics and Ontologies
Year:
2010

Citing 21
Cited 0

On the propagation of errors in the size of join results

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
STARTS: Stanford proposal for Internet meta-searching

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Probe, count, and classify: categorizing hidden web databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
The open archives initiative: building a low-barrier interoperability framework

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
Generalized Search Trees for Database Systems

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Automatic Information Discovery from the "Invisible Web"

ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
Downloading textual hidden web content through keyword queries

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Compact histograms for hierarchical identifiers

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Accessing the deep web

Communications of the ACM - ACM at sixty: a look back in time
Distributed search over the hidden web: hierarchical database sampling and selection

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Workload-aware anonymization techniques for large-scale datasets

ACM Transactions on Database Systems (TODS)
PLUS: Synthesizing privacy, lineage, uncertainty and security

ICDEW '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Whereas strategies for discovering content on the surface web are commonplace, similar strategies for the private web are non-existent. In this paper, we first establish a general framework for advertising the existence of private web resources that subsumes many existing summarisation strategies, and is based on succinct statistical summaries (which we call digests). We then investigate the trade-off between the data owners' desires to minimise disclosure of sensitive information and the searchers' desires to minimise query error, demonstrating that our techniques are superior to using k-anonymity for that purpose. Finally, we show that our techniques for summarisation do, in fact, make it possible to discover private web data resources.