Improving accuracy of classification models induced from anonymized datasets

Authors:
Mark Last;Tamir Tassa;Alexandra Zhmudyak;Erez Shmueli
Affiliations:
-;-;-;-
Venue:
Information Sciences: an International Journal
Year:
2014

Citing 38
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Generalizing data to provide anonymity when disclosing information (abstract)

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Rough set approach to incomplete information systems

Information Sciences: an International Journal
Rules in incomplete information systems

Information Sciences: an International Journal
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Top-Down Specialization for Information and Privacy Preservation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Workload-aware anonymization

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anonymizing sequential releases

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
YALE: rapid prototyping for complex data mining tasks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anatomy: simple and effective privacy preservation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Anonymizing Classification Data for Privacy Preservation

IEEE Transactions on Knowledge and Data Engineering
Thoughts on k-anonymization

Data & Knowledge Engineering
k-Anonymization with Minimal Loss of Information

IEEE Transactions on Knowledge and Data Engineering
A framework for efficient data anonymization under privacy and accuracy constraints

ACM Transactions on Database Systems (TODS)
k-Anonymization Revisited

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Anonymizing healthcare data: a case study on the blood transfusion service

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Attacks on privacy and deFinetti's theorem

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Efficient Multidimensional Suppression for K-Anonymity

IEEE Transactions on Knowledge and Data Engineering
Privacy-preserving data publishing: A survey of recent developments

ACM Computing Surveys (CSUR)
Generating microdata with p-sensitive k-anonymity property

SDM'07 Proceedings of the 4th VLDB conference on Secure data management
Privacy-preserving data mining: A feature set partitioning approach

Information Sciences: an International Journal
Non-homogeneous generalization in privacy preserving data publishing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient Anonymizations with Enhanced Utility

Transactions on Data Privacy
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Differentially private data release for data mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Limiting disclosure of sensitive data in sequential releases of databases

Information Sciences: an International Journal
A practical approximation algorithm for optimal k-anonymity

Data Mining and Knowledge Discovery
k-Concealment: An Alternative Model of k-Type Anonymity

Transactions on Data Privacy

Quantified Score

Hi-index	0.07

Visualization

Abstract

The performance of classifiers and other data mining models can be significantly enhanced using the large repositories of digital data collected nowadays by public and private organizations. However, the original records stored in those repositories cannot be released to the data miners as they frequently contain sensitive information. The emerging field of Privacy Preserving Data Publishing (PPDP) deals with this important challenge. In this paper, we present NSVDist (Non-homogeneous generalization with Sensitive Value Distributions)-a new anonymization algorithm that, given minimal anonymity and diversity parameters along with an information loss measure, issues corresponding non-homogeneous anonymizations where the sensitive attribute is published as frequency distributions over the sensitive domain rather than in the usual form of exact sensitive values. In our experiments with eight datasets and four different classification algorithms, we show that classifiers induced from data generalized by NSVDist tend to be more accurate than classifiers induced using state-of-the-art anonymization algorithms.