Privacy in data systems

Authors:
Rakesh Agrawal
Affiliations:
IBM Almaden Research Center, San Jose, CA
Venue:
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2003

Citing 9
Cited 2

Security-control methods for statistical databases: a comparative study

ACM Computing Surveys (CSUR)
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy preserving association rule mining in vertically partitioned data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An XPath-based preference language for P3P

WWW '03 Proceedings of the 12th international conference on World Wide Web
Information sharing across private databases

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Hippocratic databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Watermarking relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

CITRIS and data and knowledge engineering: what is old and what is new?

Data & Knowledge Engineering - Special jubilee issue: DKE 50
P4A: A New Privacy Model for XML

Proceeedings of the 22nd annual IFIP WG 11.3 working conference on Data and Applications Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

The explosive progress in networking, storage, and processor technologies is resulting in an unprecedented amount of digitization of information. In concert with this dramatic increase in digital data, concerns about the privacy of personal information have emerged globally. The concerns over massive collection of data are naturally extending to analytic tools applied to data. Data mining, with its promise to efficiently discover valuable, non-obvious information from large databases, is particularly vulnerable to misuse.The challenge for the database community is to design information systems that protect the privacy and ownership of individual data without impeding information flow. One way of preserving privacy of individual data values is to perturb them. Since the primary task in data mining is the development of models about aggregated data, we explore if we can develop accurate models without access to precise information in individual data records. We consider the concrete case of building a decision-tree classifier from perturbed data. While it is not possible to accurately estimate original values in individual data records, we describe a reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data. We also discuss how to discover association rules over privacy preserved data.Inspired by the privacy tenet of the Hippocratic Oath, we argue that future database systems must include responsibility for the privacy of data they manage as a founding tenet. We enunciate the key principles for such Hippocratic database systems, distilled from the principles behind current privacy legislations and guidelines. We identify the technical challenges and problems in designing Hippocratic databases, and also outline some solution approaches.