A Framework for Evaluating Privacy Preserving Data Mining Algorithms*

  • Authors:
  • Elisa Bertino;Igor Nai Fovino;Loredana Parasiliti Provenza

  • Affiliations:
  • CERIAS and CS Department, Purdue University, Indiana, USA;Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milan, Italy;Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milan, Italy

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, a new class of data mining methods, known as privacy preserving data mining (PPDM) algorithms, has been developed by the research community working on security and knowledge discovery. The aim of these algorithms is the extraction of relevant knowledge from large amount of data, while protecting at the same time sensitive information. Several data mining techniques, incorporating privacy protection mechanisms, have been developed that allow one to hide sensitive itemsets or patterns, before the data mining process is executed. Privacy preserving classification methods, instead, prevent a miner from building a classifier which is able to predict sensitive data. Additionally, privacy preserving clustering techniques have been recently proposed, which distort sensitive numerical attributes, while preserving general features for clustering analysis. A crucial issue is to determine which ones among these privacy-preserving techniques better protect sensitive information. However, this is not the only criteria with respect to which these algorithms can be evaluated. It is also important to assess the quality of the data resulting from the modifications applied by each algorithm, as well as the performance of the algorithms. There is thus the need of identifying a comprehensive set of criteria with respect to which to assess the existing PPDM algorithms and determine which algorithm meets specific requirements.In this paper, we present a first evaluation framework for estimating and comparing different kinds of PPDM algorithms. Then, we apply our criteria to a specific set of algorithms and discuss the evaluation results we obtain. Finally, some considerations about future work and promising directions in the context of privacy preservation in data mining are discussed.