Defining and enforcing privacy in data sharing

  • Authors:
  • Johannes Gehrke;Ashwin Kumar Venkatanaga Machanavajjhala

  • Affiliations:
  • Cornell University;Cornell University

  • Venue:
  • Defining and enforcing privacy in data sharing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent advances in processing and storing information has led to an explosion of data collection. Many organizations like the Census, hospitals and even search engine companies collect, analyze and distribute personal information in return for useful services. However, the collected data track entire public and private lives of individuals, thus resulting in an immense privacy risk of unauthorized disclosure. This dissertation presents novel conceptual and practical tools to ensure privacy of individuals while enabling the dissemination of valuable data about humans to improve their lives. Our contributions include novel formal definitions of the privacy risk arising from unauthorized disclosure, and practical algorithms for enforcing these definitions of privacy. We consider two distinct settings of data dissemination that require different notions of privacy. In the first part of this dissertation, we consider a setting where no sensitive information should be disclosed. We consider the problem of deciding whether answering a query on a relational database leads to any disclosure of sensitive information. This problem was shown to be intractable; we propose practical algorithms for a reasonably large set of query classes. In the second part of the dissertation, we consider the problem of publishing "anonymous" aggregate information about populations of individuals while preserving the privacy of individual-specific information. We present a novel framework for reasoning about the privacy risk in this setting. We also propose the first formal privacy definition and practical algorithms for publishing "anonymous" data that provably guarantees privacy of the individuals contributing to the data while releasing useful aggregate information. We also present a case study of applying formal privacy definitions to a real Census data publishing application.