FRAPP: a framework for high-accuracy privacy-preserving mining

Authors:
Shipra Agrawal;Jayant R. Haritsa;B. Aditya Prakash
Affiliations:
Indian Institute of Science, Bangalore, India 560012 and Stanford University, Stanford, USA;Indian Institute of Science, Bangalore, India 560012;Indian Institute of Technology, Mumbai, India 400076
Venue:
Data Mining and Knowledge Discovery
Year:
2009

Citing 29
Cited 5

Security-control methods for statistical databases: a comparative study

ACM Computing Surveys (CSUR)
C4.5: programs for machine learning

C4.5: programs for machine learning
Randomized algorithms

Randomized algorithms
Generalizing data to provide anonymity when disclosing information (abstract)

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Quantifying the utility of the past in mining large databases

Information Systems
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Cryptography and data security

Cryptography and data security
Machine Learning

Machine Learning
Using unknowns to prevent discovery of association rules

ACM SIGMOD Record
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Statistical Databases: Characteristics, Problems, and some Solutions

VLDB '82 Proceedings of the 8th International Conference on Very Large Data Bases
Hiding Association Rules by Using Confidence and Support

IHW '01 Proceedings of the 4th International Workshop on Information Hiding
Limiting privacy breaches in privacy preserving data mining

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy preserving association rule mining in vertically partitioned data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Disclosure Limitation of Sensitive Rules

KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
Privacy Preserving Association Rule Mining

RIDE '02 Proceedings of the 12th International Workshop on Research Issues in Data Engineering: Engineering E-Commerce/E-Business Systems (RIDE'02)
On the Privacy Preserving Properties of Random Data Perturbation Techniques

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Privacy-preserving k-means clustering over vertically partitioned data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Managing healthcare data hippocratically

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A new scheme on privacy preserving association rule mining

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Privacy preserving OLAP

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Privacy via pseudorandom sketches

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Hippocratic databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Maintaining data privacy in association rule mining

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Limiting disclosure in hippocratic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Auditing compliance with a Hippocratic database

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
The boundary between privacy and utility in data publishing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases

ρ-uncertainty: inference-proof transaction anonymization

Proceedings of the VLDB Endowment
Privacy Preserving OLAP over Distributed XML Data: A Theoretically-Sound Secure-Multiparty-Computation Approach

Journal of Computer and System Sciences
Towards a theory for privacy preserving distributed OLAP

Proceedings of the 2012 Joint EDBT/ICDT Workshops
Anonymizing set-valued data by nonreciprocal recoding

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A universal toolkit for cryptographically secure privacy-preserving data mining

PAISI'12 Proceedings of the 2012 Pacific Asia conference on Intelligence and Security Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of individual data records have been proposed recently. In this paper, we present FRAPP, a generalized matrix-theoretic framework of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, FRAPP is used to demonstrate that (a) the prior techniques differ only in their choices for the perturbation matrix elements, and (b) a symmetric positive-definite perturbation matrix with minimal condition number can be identified, substantially enhancing the accuracy even under strict privacy requirements. We also propose a novel perturbation mechanism wherein the matrix elements are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at only a marginal reduction in accuracy. The quantitative utility of FRAPP, which is a general-purpose random-perturbation-based privacy-preserving mining technique, is evaluated specifically with regard to association and classification rule mining on a variety of real datasets. Our experimental results indicate that, for a given privacy requirement, either substantially lower modeling errors are incurred as compared to the prior techniques, or the errors are comparable to those of direct mining on the true database.