Discovering frequent patterns in sensitive data

Authors:
Raghav Bhaskar;Srivatsan Laxman;Adam Smith;Abhradeep Thakurta
Affiliations:
Microsoft Research, Bangalore, India;Microsoft Research, Bangalore, India;Pennsylvania State University, University Park, PA, USA;Pennsylvania State University, University Park, PA, USA
Venue:
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2010

Citing 18
Cited 21

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Principles of data mining

Principles of data mining
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Limiting privacy breaches in privacy preserving data mining

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
State of the art of graph-based data mining

ACM SIGKDD Explorations Newsletter
A Framework for High-Accuracy Privacy-Preserving Mining

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Mechanism Design via Differential Privacy

FOCS '07 Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science
A learning theory approach to non-interactive database privacy

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Composition attacks and auxiliary information in data privacy

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Temporal pattern discovery for trends and transient effects: its application to patient records

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Releasing search queries and clicks privately

Proceedings of the 18th international conference on World wide web
Privacy integrated queries: an extensible platform for privacy-preserving data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Information Discovery on Electronic Health Records

Information Discovery on Electronic Health Records
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Calibrating noise to sensitivity in private data analysis

TCC'06 Proceedings of the Third conference on Theory of Cryptography

Personalized social recommendations: accurate or private

Proceedings of the VLDB Endowment
Differentially private data cubes: optimizing noise sources and consistency

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
iReduct: differential privacy with reduced relative errors

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Differentially private data release for data mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Differential privacy in data publication and analysis

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Differential identifiability

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
PrivBasis: frequent itemset mining with differential privacy

Proceedings of the VLDB Endowment
Low-rank mechanism: optimizing batch queries under differential privacy

Proceedings of the VLDB Endowment
Functional mechanism: regression analysis under differential privacy

Proceedings of the VLDB Endowment
Differentially private top-k query over MapReduce

Proceedings of the fourth international workshop on Cloud data management
On differentially private frequent itemset mining

Proceedings of the VLDB Endowment
Efficient and accurate strategies for differentially-private sliding window queries

Proceedings of the 16th International Conference on Extending Database Technology
πBox: a platform for privacy-preserving apps

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Mining frequent graph patterns with differential privacy

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-preserving data exploration in genome-wide association studies

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Differential privacy for neighborhood-based collaborative filtering

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
A two-phase algorithm for mining sequential patterns with differential privacy

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
DiffR-Tree: a differentially private spatial index for OLAP query

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Mining frequent patterns with differential privacy

Proceedings of the VLDB Endowment
A new tool for sharing and querying of clinical documents modeled using HL7 Version 3 standard

Computer Methods and Programs in Biomedicine
Differentially private histogram publication

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discovering frequent patterns from data is a popular exploratory technique in datamining. However, if the data are sensitive (e.g., patient health records, user behavior records) releasing information about significant patterns or trends carries significant risk to privacy. This paper shows how one can accurately discover and release the most significant patterns along with their frequencies in a data set containing sensitive information, while providing rigorous guarantees of privacy for the individuals whose information is stored there. We present two efficient algorithms for discovering the k most frequent patterns in a data set of sensitive records. Our algorithms satisfy differential privacy, a recently introduced definition that provides meaningful privacy guarantees in the presence of arbitrary external information. Differentially private algorithms require a degree of uncertainty in their output to preserve privacy. Our algorithms handle this by returning 'noisy' lists of patterns that are close to the actual list of k most frequent patterns in the data. We define a new notion of utility that quantifies the output accuracy of private top-k pattern mining algorithms. In typical data sets, our utility criterion implies low false positive and false negative rates in the reported lists. We prove that our methods meet the new utility criterion; we also demonstrate the performance of our algorithms through extensive experiments on the transaction data sets from the FIMI repository. While the paper focuses on frequent pattern mining, the techniques developed here are relevant whenever the data mining output is a list of elements ordered according to an appropriately 'robust' measure of interest.