A Probabilistic Framework for Building Privacy-Preserving Synopses of Multi-dimensional Data

  • Authors:
  • Filippo Furfaro;Giuseppe M. Mazzeo;Domenico Saccà

  • Affiliations:
  • University of Calabria, Rende (CS), Italy 87036;University of Calabria, Rende (CS), Italy 87036 and ICAR-CNR, Rende (CS), Italy 87036;University of Calabria, Rende (CS), Italy 87036 and ICAR-CNR, Rende (CS), Italy 87036

  • Venue:
  • SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of summarizing multi-dimensional data into lossy synopses supporting the estimation of aggregate range queries has been deeply investigated in the last three decades. Several summarization techniques have been proposed, based on different approaches, such as histograms, wavelets and sampling. The aim of most of the works in this area was to devise techniques for constructing effective synopses, enabling range queries to be estimated, trading off the efficiency of query evaluation with the accuracy of query estimates. In this paper, the use of summarization is investigated in a more specific context, where privacy issues are taken into account. In particular, we study the problem of constructing privacy-preserving synopses, that is synopses preventing sensitive information from being extracted while supporting `safe' analysis tasks. In this regard, we introduce a probabilistic framework enabling the evaluation of the quality of the estimates which can be obtained by a user owning the summary data. Based on this framework, we devise a technique for constructing histogram-based synopses of multi-dimensional data which provide as much accurate as possible answers for a given workload of `safe' queries, while preventing high-quality estimates of sensitive information from being extracted.