Sub-linear queries statistical databases: privacy with power

  • Authors:
  • Cynthia Dwork

  • Affiliations:
  • Microsoft Research

  • Venue:
  • CT-RSA'05 Proceedings of the 2005 international conference on Topics in Cryptology
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S,f) where S is a set of rows in the database and f is a function mapping database rows to {0,1}.The true response is ∑r∈Sf(DBr),a noisy version of which is released. Results in [3, 4] show that a strong form of privacy can be maintained using a surprisingly small amount of noise, provided the total number of queries is sublinear in the number n of database rows. We call this a sub-linear queries (SuLQ) database. The assumption of sublinearity becomes reasonable as databases grow increasingly large. The SuLQ primitive – query and noisy reply – gives rise to a calculus of noisy computation. After reviewing some results of [4] on multi-attribute SuLQ, we illustrate the power of the SuLQ primitive with three examples [2]: principal component analysis, k means clustering, and learning in the statistical queries learning model.