Aggregate suppression for enterprise search engines

Authors:
Mingyang Zhang;Nan Zhang;Gautam Das
Affiliations:
George Washington University, Washington, DC, USA;George Washington University, Washington, DC, USA;University of Texas at Arlington, Arlington, TX, USA
Venue:
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Year:
2012

Citing 20
Cited 1

A technique for measuring the relative size and overlap of public Web search engines

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Disclosure Limitation of Sensitive Rules

KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
Association Rule Hiding

IEEE Transactions on Knowledge and Data Engineering
Simulatable auditing

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy preserving OLAP

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Random sampling from a search engine's index

Proceedings of the 15th international conference on World Wide Web
Towards robustness in query auditing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
An integer programming approach for frequent itemset hiding

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
L-diversity: Privacy beyond k-anonymity

ACM Transactions on Knowledge Discovery from Data (TKDD)
Efficient search engine measurements

Proceedings of the 16th international conference on World Wide Web
"I know what you did last summer": query logs and user privacy

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Estimating the impressionrank of web pages

Proceedings of the 18th international conference on World wide web
Privacy preservation of aggregates in hidden databases: why and how?

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Effective anonymization of query logs

Proceedings of the 18th ACM conference on Information and knowledge management
Generalized distances between rankings

Proceedings of the 19th international conference on World wide web
Website privacy preservation for query log publishing

PinKDD'07 Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
Mining a search engine's corpus: efficient yet unbiased sampling and aggregate estimation

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Calibrating noise to sensitivity in private data analysis

TCC'06 Proceedings of the Third conference on Theory of Cryptography

Mining a search engine's corpus without a query pool

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many enterprise websites provide search engines to facilitate customer access to their underlying documents or data. With the web interface of such a search engine, a customer can specify one or a few keywords that he/she is interested in; and the search engine returns a list of documents/tuples matching the user-specified keywords, sorted by an often-proprietary scoring function. It was traditionally believed that, because of its highly-restrictive interface (i.e., keyword search only, no SQL-style queries), such a search engine serves its purpose of answering individual keyword-search queries without disclosing big-picture aggregates over the data which, as we shall show in the paper, may incur significant privacy concerns to the enterprise. Nonetheless, recent work on sampling and aggregate estimation over a search engine's corpus through its keyword-search interface transcends this traditional belief. In this paper, we consider a novel problem of suppressing sensitive aggregates for enterprise search engines while maintaining the quality of answers provided to individual keyword-search queries. We demonstrate the effectiveness and efficiency of our novel techniques through theoretical analysis and extensive experimental studies.