Reconstruction attack through classifier analysis

Authors:
Sébastien Gambs;Ahmed Gmati;Michel Hurfin
Affiliations:
Institut de Recherche en Informatique et Systèmes Aléatoires, Université de Rennes 1, Rennes Cedex, France, Institut National de Recherche en Informatique et en Automatique, INRIA R ...;Institut de Recherche en Informatique et Systèmes Aléatoires, Université de Rennes 1, Rennes Cedex, France;Institut National de Recherche en Informatique et en Automatique, INRIA Rennes - Bretagne Atlantique, France
Venue:
DBSec'12 Proceedings of the 26th Annual IFIP WG 11.3 conference on Data and Applications Security and Privacy
Year:
2012

Citing 9
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Induction of Decision Trees

Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
State-of-the-art in privacy preserving data mining

ACM SIGMOD Record
Privacy-Preserving Data Mining: Models and Algorithms

Privacy-Preserving Data Mining: Models and Algorithms
Attacks on privacy and deFinetti's theorem

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Against Classification Attacks: A Decision Tree Pruning Approach to Privacy Protection in Data Mining

Operations Research
Protecting individual information against inference attacks in data publishing

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we introduce a novel inference attack that we coin as the reconstruction attack whose objective is to reconstruct a probabilistic version of the original dataset on which a classifier was learnt from the description of this classifier and possibly some auxiliary information. In a nutshell, the reconstruction attack exploits the structure of the classifier in order to derive a probabilistic version of dataset on which this model has been trained. Moreover, we propose a general framework that can be used to assess the success of a reconstruction attack in terms of a novel distance between the reconstructed and original datasets. In case of multiple releases of classifiers, we also give a strategy that can be used to merge the different reconstructed datasets into a single coherent one that is closer to the original dataset than any of the simple reconstructed datasets. Finally, we give an instantiation of this reconstruction attack on a decision tree classifier that was learnt using the algorithm C4.5 and evaluate experimentally its efficiency. The results of this experimentation demonstrate that the proposed attack is able to reconstruct a significant part of the original dataset, thus highlighting the need to develop new learning algorithms whose output is specifically tailored to mitigate the success of this type of attack.