Limiting privacy breaches in privacy preserving data mining

Authors:
Alexandre Evfimievski;Johannes Gehrke;Ramakrishnan Srikant
Affiliations:
Cornell University;Cornell University;IBM Almaden Research Center
Venue:
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2003

Citing 7
Cited 184

Error-correcting codes and finite fields (student ed.)

Error-correcting codes and finite fields (student ed.)
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy preserving association rule mining in vertically partitioned data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Maintaining data privacy in association rule mining

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

On the Privacy Preserving Properties of Random Data Perturbation Techniques

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Privacy preserving mining of association rules

Information Systems - Knowledge discovery and data mining (KDD 2002)
Mining association rules with non-uniform privacy concerns

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Optimal randomization for privacy preserving data mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Hiding Sensitive Patterns in Association Rules Mining

COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Volume 01
CITRIS and data and knowledge engineering: what is old and what is new?

Data & Knowledge Engineering - Special jubilee issue: DKE 50
Cardinality-based inference control in OLAP systems: an information theoretic approach

Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
A Framework for High-Accuracy Privacy-Preserving Mining

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Random-data perturbation techniques and privacy-preserving data mining

Knowledge and Information Systems
Simulatable auditing

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Practical privacy: the SuLQ framework

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy-enhancing k-anonymization of customer data

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Deriving private information from randomized data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Privacy preserving OLAP

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Cardinality-based inference control in data cubes

Journal of Computer Security
Blocking-aware private record linkage

Proceedings of the 2nd international workshop on Information quality in information systems
Anonymity-preserving data collection

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining

IEEE Transactions on Knowledge and Data Engineering
Privacy Preserving Data Classification with Rotation Perturbation

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A privacy-preserving collaborative filtering scheme with two-way communication

EC '06 Proceedings of the 7th ACM conference on Electronic commerce
Privacy via pseudorandom sketches

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Dynamic authenticated index structures for outsourced databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Injecting utility into anonymized datasets

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms

The VLDB Journal — The International Journal on Very Large Data Bases
Privacy leakage in multi-relational databases: a semi-supervised learning perspective

The VLDB Journal — The International Journal on Very Large Data Bases
Towards robustness in query auditing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Privacy-preserving mining by rotational data transformation

Proceedings of the 43rd annual Southeast regional conference - Volume 1
L-diversity: Privacy beyond k-anonymity

ACM Transactions on Knowledge Discovery from Data (TKDD)
A formal analysis of information disclosure in data exchange

Journal of Computer and System Sciences
Hiding informative association rule sets

Expert Systems with Applications: An International Journal
Probable innocence revisited

Theoretical Computer Science - Automated reasoning for security protocol analysis
Privacy-preserving boosting

Data Mining and Knowledge Discovery
Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography

Proceedings of the 16th international conference on World Wide Web
Privacy-enhancing personalized web search

Proceedings of the 16th international conference on World Wide Web
Smooth sensitivity and sampling in private data analysis

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Privacy, accuracy, and consistency too: a holistic solution to contingency table release

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Hiding collaborative recommendation association rules

Applied Intelligence
Large-scale collection and sanitization of network security data: risks and challenges

NSPW '06 Proceedings of the 2006 workshop on New security paradigms
Challenges in mining social network data: processes, privacy, and paradoxes

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Private web search

Proceedings of the 2007 ACM workshop on Privacy in electronic society
Preserving privacy in association rule mining with bloom filters

Journal of Intelligent Information Systems
Time series compressibility and privacy

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
The boundary between privacy and utility in data publishing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
The applicability of the perturbation based privacy preserving data mining for real-world data

Data & Knowledge Engineering
Privacy preserving data obfuscation for inherently clustered data

International Journal of Information and Computer Security
A fuzzy programming approach for data reduction and privacy in distance-based mining

International Journal of Information and Computer Security
A novel data distortion approach via selective SSVD for privacy protection

International Journal of Information and Computer Security
An efficient sanitization algorithm for balancing information privacy and knowledge discovery in association patterns mining

Data & Knowledge Engineering
A learning theory approach to non-interactive database privacy

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Dynamic anonymization: accurate statistical analysis with privacy preservation

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Preservation of proximity privacy in publishing numerical sensitive data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Epistemic privacy

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient sanitization of informative association rules

Expert Systems with Applications: An International Journal
Anonymity preserving pattern discovery

The VLDB Journal — The International Journal on Very Large Data Bases
Providing k-anonymity in data mining

The VLDB Journal — The International Journal on Very Large Data Bases
A privacy preserving technique for distance-based classification with worst case privacy guarantees

Data & Knowledge Engineering
Guided perturbation: towards private and accurate mining

The VLDB Journal — The International Journal on Very Large Data Bases
The cost of privacy: destruction of data-mining utility in anonymized data publishing

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Composition attacks and auxiliary information in data privacy

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Secrecy in Multiagent Systems

ACM Transactions on Information and System Security (TISSEC)
Privacy Preserving Data Mining Research: Current Status and Key Issues

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
Privacy-Preserving Set Union

ACNS '07 Proceedings of the 5th international conference on Applied Cryptography and Network Security
Generic Probability Density Function Reconstruction for Randomization in Privacy-Preserving Data Mining

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Privacy Preserving Market Basket Data Analysis

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Distributed Private Data Analysis: Simultaneously Solving How and What

CRYPTO 2008 Proceedings of the 28th Annual conference on Cryptology: Advances in Cryptology
Simulatable Binding: Beyond Simulatable Auditing

SDM '08 Proceedings of the 5th VLDB workshop on Secure Data Management
Generalization-Based Privacy-Preserving Data Collection

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Protecting business intelligence and customer privacy while outsourcing data mining tasks

Knowledge and Information Systems
Access control over uncertain data

Proceedings of the VLDB Endowment
An efficient protocol for private and accurate mining of support counts

Pattern Recognition Letters
PoolView: stream privacy for grassroots participatory sensing

Proceedings of the 6th ACM conference on Embedded network sensor systems
Maintenance of sanitizing informative association rules

Expert Systems with Applications: An International Journal
Determining error bounds for spectral filtering based reconstruction methods in privacy preserving data mining

Knowledge and Information Systems
FRAPP: a framework for high-accuracy privacy-preserving mining

Data Mining and Knowledge Discovery
Privacy preserving itemset mining through noisy items

Expert Systems with Applications: An International Journal
Evaluating privacy threats in released database views by symmetric indistinguishability

Journal of Computer Security - Selected papers from the Third and Fourth Secure Data Management (SDM) workshops
Adversarial-knowledge dimensions in data privacy

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient and Anonymous Online Data Collection

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Private coresets

Proceedings of the forty-first annual ACM symposium on Theory of computing
A brief survey on anonymization techniques for privacy preserving publishing of social network data

ACM SIGKDD Explorations Newsletter
Collusion-resistant anonymous data collection method

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
k-Anonymous data collection

Information Sciences: an International Journal
Relationship privacy: output perturbation for queries with joins

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Attacks on privacy and deFinetti's theorem

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
On Distributed k-Anonymization

Fundamenta Informaticae
A distributed approach to enabling privacy-preserving model-based classifier training

Knowledge and Information Systems
Reconstructing Data Perturbed by Random Projections When the Mixing Matrix Is Known

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
A novel anonymization algorithm: Privacy protection and knowledge preservation

Expert Systems with Applications: An International Journal
Privacy-Preserving Data Publishing

Foundations and Trends in Databases
Employing PRBAC for privacy preserving data publishing

Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Quantifying information flow with beliefs

Journal of Computer Security - 18th IEEE Computer Security Foundations Symposium (CSF 18)
Information theory for data management

Proceedings of the VLDB Endowment
Data publishing against realistic adversaries

Proceedings of the VLDB Endowment
Optimal random perturbation at multiple privacy levels

Proceedings of the VLDB Endowment
Publishing naive Bayesian classifiers: privacy without accuracy loss

Proceedings of the VLDB Endowment
Distributed data mining and agents

Engineering Applications of Artificial Intelligence
Inference in distributed data clustering

Engineering Applications of Artificial Intelligence
A cubic-wise balance approach for privacy preservation in data cubes

Information Sciences: an International Journal
A cryptography index technology and method to measure information disclosure in the DAS model

WSEAS Transactions on Information Science and Applications
Hiding collaborative recommendation association rules on horizontally partitioned data

Intelligent Data Analysis
Transparent anonymization: Thwarting adversaries who know the algorithm

ACM Transactions on Database Systems (TODS)
The hardness and approximation algorithms for l-diversity

Proceedings of the 13th International Conference on Extending Database Technology
Beyond k-Anonymity: A Decision Theoretic Framework for Assessing Privacy Risk

Transactions on Data Privacy
Privacy Preserving Categorical Data Analysis with Unknown Distortion Parameters

Transactions on Data Privacy
A practice-oriented framework for measuring privacy and utility in data sanitization systems

Proceedings of the 2010 EDBT/ICDT Workshops
Privacy preserving linear discriminant analysis from perturbed data

Proceedings of the 2010 ACM Symposium on Applied Computing
Using cryptography for privacy protection in data mining systems

WImBI'06 Proceedings of the 1st WICI international conference on Web intelligence meets brain informatics
K-anonymization with minimal loss of information

ESA'07 Proceedings of the 15th annual European conference on Algorithms
Efficient privacy preserving distributed clustering based on secret sharing

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
On addressing accuracy concerns in privacy preserving association rule mining

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Differential privacy: a survey of results

TAMC'08 Proceedings of the 5th international conference on Theory and applications of models of computation
An ad omnia approach to defining and achieving private data analysis

PinKDD'07 Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
Privacy-preserving data mining through knowledge model sharing

PinKDD'07 Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
Towards privacy-preserving model selection

PinKDD'07 Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
Preserving the privacy of sensitive relationships in graph data

PinKDD'07 Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
Towards an axiomatization of statistical privacy and utility

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Differentially private aggregation of distributed time-series with transformation and encryption

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Information theory for data management

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Versatile publishing for privacy preservation

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering frequent patterns in sensitive data

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Airavat: security and privacy for MapReduce

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Epistemic privacy

Journal of the ACM (JACM)
Fast secure computation of set intersection

SCN'10 Proceedings of the 7th international conference on Security and cryptography for networks
A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases

Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Extending l-diversity to generalize sensitive data

Data & Knowledge Engineering
Small domain randomization: same privacy, more utility

Proceedings of the VLDB Endowment
ρ-uncertainty: inference-proof transaction anonymization

Proceedings of the VLDB Endowment
Privacy-preserving publishing microdata with full functional dependencies

Data & Knowledge Engineering
Discord region based analysis to improve data utility of privately published time series

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Social Network Analysis and Mining for Business Applications

ACM Transactions on Intelligent Systems and Technology (TIST)
Privacy-preserving Bayesian network parameter learning

CIMMACS'05 Proceedings of the 4th WSEAS international conference on Computational intelligence, man-machine systems and cybernetics
Differentially Private Empirical Risk Minimization

The Journal of Machine Learning Research
A survey on privacy in mobile participatory sensing applications

Journal of Systems and Software
Distributed social graph embedding

Proceedings of the 20th ACM international conference on Information and knowledge management
Utility-driven anonymization in data publishing

Proceedings of the 20th ACM international conference on Information and knowledge management
What Can We Learn Privately?

SIAM Journal on Computing
Ask a better question, get a better answer a new approach to private data analysis

ICDT'07 Proceedings of the 11th international conference on Database Theory
Privacy in GLAV information integration

ICDT'07 Proceedings of the 11th international conference on Database Theory
Query evaluation on a database given by a random graph

ICDT'07 Proceedings of the 11th international conference on Database Theory
An attacker's view of distance preserving maps for privacy preserving data mining

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Privacy-preserving decision tree mining based on random substitutions

ETRICS'06 Proceedings of the 2006 international conference on Emerging Trends in Information and Communication Security
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Suppressing microdata to prevent probabilistic classification based inference

SDM'05 Proceedings of the Second VDLB international conference on Secure Data Management
Privacy in database publishing

ICDT'05 Proceedings of the 10th international conference on Database Theory
Anonymizing tables

ICDT'05 Proceedings of the 10th international conference on Database Theory
An information theoretic privacy and utility measure for data sanitization mechanisms

Proceedings of the second ACM conference on Data and Application Security and Privacy
Inference on distributed data clustering

MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Toward privacy in public databases

TCC'05 Proceedings of the Second international conference on Theory of Cryptography
Performance measurements for privacy preserving data mining

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Data distortion for privacy protection in a terrorist analysis system

ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
Indistinguishability: the other aspect of privacy

SDM'06 Proceedings of the Third VLDB international conference on Secure Data Management
Practical private set intersection protocols with linear complexity

FC'10 Proceedings of the 14th international conference on Financial Cryptography and Data Security
When random sampling preserves privacy

CRYPTO'06 Proceedings of the 26th annual international conference on Advances in Cryptology
Our data, ourselves: privacy via distributed noise generation

EUROCRYPT'06 Proceedings of the 24th annual international conference on The Theory and Applications of Cryptographic Techniques
Beyond k-anonymity: a decision theoretic framework for assessing privacy risk

PSD'06 Proceedings of the 2006 CENEX-SDC project international conference on Privacy in Statistical Databases
Calibrating noise to sensitivity in private data analysis

TCC'06 Proceedings of the Third conference on Theory of Cryptography
k-Concealment: An Alternative Model of k-Type Anonymity

Transactions on Data Privacy
Towards statistical queries over distributed private user data

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
A historical probability based noise generation strategy for privacy protection in cloud computing

Journal of Computer and System Sciences
Limiting link disclosure in social network analysis through subgraph-wise perturbation

Proceedings of the 15th International Conference on Extending Database Technology
An information-theoretic model of voting systems

Mathematical and Computer Modelling: An International Journal
A trust-based noise injection strategy for privacy protection in cloud

Software—Practice & Experience
Privacy-preserving frequent itemsets mining via secure collaborative framework

Security and Communication Networks
Survey: DNA-inspired information concealing: A survey

Computer Science Review
P-top-k queries in a probabilistic framework from information extraction models

Computers & Mathematics with Applications
A Time-Series Pattern Based Noise Generation Strategy for Privacy Protection in Cloud Computing

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Societal computing

Proceedings of the 34th International Conference on Software Engineering
Anonymizing set-valued data by nonreciprocal recoding

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Publishing microdata with a robust privacy guarantee

Proceedings of the VLDB Endowment
On Distributed k-Anonymization

Fundamenta Informaticae
A clustering approach for structural k-anonymity in social networks using genetic algorithm

Proceedings of the CUBE International Information Technology Conference
Privacy preserving K-Medoids clustering: an approach towards securing data in Mobile cloud architecture

Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
Worst- and average-case privacy breaches in randomization mechanisms

TCS'12 Proceedings of the 7th IFIP TC 1/WG 202 international conference on Theoretical Computer Science
Breaching Euclidean distance-preserving data perturbation using few known inputs

Data & Knowledge Engineering
A Knowledge Model Sharing Based Approach to Privacy-Preserving Data Mining

Transactions on Data Privacy
An association probability based noise generation strategy for privacy protection in cloud computing

ICSOC'12 Proceedings of the 10th international conference on Service-Oriented Computing
Secure Two-Party Association Rule Mining Based on One-Pass FP-Tree

International Journal of Information Security and Privacy
Bands of privacy preserving objectives: classification of PPDM strategies

AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Denials leak information: Simulatable auditing

Journal of Computer and System Sciences
Trends and research directions for privacy preserving approaches on the cloud

Proceedings of the 6th ACM India Computing Convention
Solving inverse frequent itemset mining with infrequency constraints via large-scale linear programs

ACM Transactions on Knowledge Discovery from Data (TKDD)
A near-optimal algorithm for differentially-private principal components

The Journal of Machine Learning Research
A generic and distributed privacy preserving classification method with a worst-case privacy guarantee

Distributed and Parallel Databases

Quantified Score

Hi-index	0.01

Visualization

Abstract

There has been increasing interest in the problem of building accurate data mining models over aggregate data, while protecting privacy at the level of individual records. One approach for this problem is to randomize the values in individual records, and only disclose the randomized values. The model is then built over the randomized data, after first compensating for the randomization (at the aggregate level). This approach is potentially vulnerable to privacy breaches: based on the distribution of the data, one may be able to learn with high confidence that some of the randomized records satisfy a specified property, even though privacy is preserved on average.In this paper, we present a new formulation of privacy breaches, together with a methodology, "amplification", for limiting them. Unlike earlier approaches, amplification makes it is possible to guarantee limits on privacy breaches without any knowledge of the distribution of the original data. We instantiate this methodology for the problem of mining association rules, and modify the algorithm from [9] to limit privacy breaches without knowledge of the data distribution. Next, we address the problem that the amount of randomization required to avoid privacy breaches (when mining association rules) results in very long transactions. By using pseudorandom generators and carefully choosing seeds such that the desired items from the original transaction are present in the randomized transaction, we can send just the seed instead of the transaction, resulting in a dramatic drop in communication and storage cost. Finally, we define new information measures that take privacy breaches into account when quantifying the amount of privacy preserved by randomization.