Anonymizing Classification Data for Privacy Preservation

Authors:
Benjamin C. M. Fung;Ke Wang;Philip S. Yu
Affiliations:
-;-;IEEE
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2007

Citing 22
Cited 32

Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems

Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems
C4.5: programs for machine learning

C4.5: programs for machine learning
Generalizing data to provide anonymity when disclosing information (abstract)

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
Datafly: A System for Providing Anonymity in Medical Data

Proceedings of the IFIP TC11 WG11.3 Eleventh International Conference on Database Securty XI: Status and Prospects
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Bottom-Up Generalization: A Data Mining Solution to Privacy Protection

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Top-Down Specialization for Information and Privacy Preservation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Template-Based Privacy Preservation in Classification Problems

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Personalized privacy preservation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Workload-aware anonymization

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anonymizing sequential releases

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Utility-based anonymization using local recoding

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Handicapping attacker's confidence: an alternative to k-anonymization

Knowledge and Information Systems
Integrating private databases for data analysis

ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics

On static and dynamic methods for condensation-based privacy-preserving data mining

ACM Transactions on Database Systems (TODS)
Anonymity for continuous data publishing

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Data privacy protection in multi-party clustering

Data & Knowledge Engineering
A Data Perturbation Method by Field Rotation and Binning by Averages Strategy for Privacy Preservation

IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning
Privacy-preserving data mashup

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Privacy protection for RFID data

Proceedings of the 2009 ACM symposium on Applied Computing
Privately detecting bursts in streaming, distributed time series data

Data & Knowledge Engineering
Privacy-preserving data publishing for cluster analysis

Data & Knowledge Engineering
A brief survey on anonymization techniques for privacy preserving publishing of social network data

ACM SIGKDD Explorations Newsletter
Anonymizing healthcare data: a case study on the blood transfusion service

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Anonymizing location-based RFID data

C3S2E '09 Proceedings of the 2nd Canadian Conference on Computer Science and Software Engineering
Preserving Privacy in Time Series Data Classification by Discretization

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
A novel anonymization algorithm: Privacy protection and knowledge preservation

Expert Systems with Applications: An International Journal
Walking in the crowd: anonymizing trajectory data for pattern analysis

Proceedings of the 18th ACM conference on Information and knowledge management
Against Classification Attacks: A Decision Tree Pruning Approach to Privacy Protection in Data Mining

Operations Research
Privacy-preserving data publishing: A survey of recent developments

ACM Computing Surveys (CSUR)
A data perturbation approach to sensitive classification rule hiding

Proceedings of the 2010 ACM Symposium on Applied Computing
Centralized and Distributed Anonymization for High-Dimensional Healthcare Data

ACM Transactions on Knowledge Discovery from Data (TKDD)
A granular agent evolutionary algorithm for classification

Applied Soft Computing
Verification of data pattern for interactive privacy preservation model

Proceedings of the 2011 ACM Symposium on Applied Computing
Differentially private data release for data mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy preservation for associative classification: an approximation algorithm

International Journal of Business Intelligence and Data Mining
Anonymity meets game theory: secure data integration with malicious participants

The VLDB Journal — The International Journal on Very Large Data Bases
Hiding emerging patterns with local recoding generalization

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Information based data anonymization for classification utility

Data & Knowledge Engineering
The application of differential privacy to health data

Proceedings of the 2012 Joint EDBT/ICDT Workshops
Secure distributed framework for achieving ε-differential privacy

PETS'12 Proceedings of the 12th international conference on Privacy Enhancing Technologies
Preserving Privacy in Time Series Data Mining

International Journal of Data Warehousing and Mining
Incremental processing and indexing for k, e-anonymisation

International Journal of Information and Computer Security
A new tool for sharing and querying of clinical documents modeled using HL7 Version 3 standard

Computer Methods and Programs in Biomedicine
Improving accuracy of classification models induced from anonymized datasets

Information Sciences: an International Journal
Reversible privacy preserving data mining: a combination of difference expansion and privacy preserving

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification is a fundamental problem in data analysis. Training a classifier requires accessing a large collection of data. Releasing person-specific data, such as customer data or patient records, may pose a threat to an individual's privacy. Even after removing explicit identifying information such as Name and SSN, it is still possible to link released records back to their identities by matching some combination of nonidentifying attributes such as \{Sex, Zip, Birthdate\}. A useful approach to combat such linking attacks, called k-anonymization [1], is anonymizing the linking attributes so that at least k released records match each value combination of the linking attributes. Previous work attempted to find an optimal k-anonymization that minimizes some data distortion metric. We argue that minimizing the distortion to the training data is not relevant to the classification goal that requires extracting the structure of predication on the "future” data. In this paper, we propose a k-anonymization solution for classification. Our goal is to find a k-anonymization, not necessarily optimal in the sense of minimizing data distortion, which preserves the classification structure. We conducted intensive experiments to evaluate the impact of anonymization on the classification on future data. Experiments on real-life data show that the quality of classification can be preserved even for highly restrictive anonymity requirements.