Using Anonymized Data for Classification

Authors:
Ali Inan;Murat Kantarcioglu;Elisa Bertino
Affiliations:
-;-;-
Venue:
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Year:
2009

Citing 0
Cited 5

Privacy-preserving outsourcing support vector machines with random transformation

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Publishing time-series data under preservation of privacy and distance orders

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
An information theoretic approach for privacy metrics

Transactions on Data Privacy
A semantic information loss metric for privacy preserving publication

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Information preservation in statistical privacy and bayesian estimation of unattributed histograms

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, anonymization methods have emerged as an important tool to preserve individual privacy when releasing privacy sensitive data sets. This interest in anonymization techniques has resulted in a plethora of methods for anonymizing data under different privacy and utility assumptions. At the same time, there has been little research addressing how to effectively use the anonymized data for data mining in general and for distributed data mining in particular. In this paper, we propose a new approach for building classifiers using anonymized data by modeling anonymized data as uncertain data. In our method, we do not assume any probability distribution over the data. Instead, we propose collecting all necessary statistics during anonymization and releasing these together with the anonymized data. We show that releasing such statistics does not violate anonymity. Experiments spanning various alternatives both in local and distributed data mining settings reveal that our method performs better than heuristic approaches for handling anonymized data.