Security-control methods for statistical databases: a comparative study
ACM Computing Surveys (CSUR)
Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the design and quantification of privacy preserving data mining algorithms
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy preserving mining of association rules
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy preserving association rule mining in vertically partitioned data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An XPath-based preference language for P3P
WWW '03 Proceedings of the 12th international conference on World Wide Web
Information sharing across private databases
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Watermarking relational databases
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
CITRIS and data and knowledge engineering: what is old and what is new?
Data & Knowledge Engineering - Special jubilee issue: DKE 50
P4A: A New Privacy Model for XML
Proceeedings of the 22nd annual IFIP WG 11.3 working conference on Data and Applications Security
Hi-index | 0.00 |
The explosive progress in networking, storage, and processor technologies is resulting in an unprecedented amount of digitization of information. In concert with this dramatic increase in digital data, concerns about the privacy of personal information have emerged globally. The concerns over massive collection of data are naturally extending to analytic tools applied to data. Data mining, with its promise to efficiently discover valuable, non-obvious information from large databases, is particularly vulnerable to misuse.The challenge for the database community is to design information systems that protect the privacy and ownership of individual data without impeding information flow. One way of preserving privacy of individual data values is to perturb them. Since the primary task in data mining is the development of models about aggregated data, we explore if we can develop accurate models without access to precise information in individual data records. We consider the concrete case of building a decision-tree classifier from perturbed data. While it is not possible to accurately estimate original values in individual data records, we describe a reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data. We also discuss how to discover association rules over privacy preserved data.Inspired by the privacy tenet of the Hippocratic Oath, we argue that future database systems must include responsibility for the privacy of data they manage as a founding tenet. We enunciate the key principles for such Hippocratic database systems, distilled from the principles behind current privacy legislations and guidelines. We identify the technical challenges and problems in designing Hippocratic databases, and also outline some solution approaches.