Data transformation for privacy-preserving data mining

  • Authors:
  • Stanley Robson De Medeiros Oliveira

  • Affiliations:
  • University of Alberta (Canada)

  • Venue:
  • Data transformation for privacy-preserving data mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The sharing of data is often beneficial in data mining applications. It has been proven useful to support both decision-making processes and to promote social goals. However, the sharing of data has also raised a number of ethical issues. Some such issues include those of privacy, data security, and intellectual property rights. In this thesis, we focus primarily on privacy issues in data mining, notably when data are shared before mining. Specifically, we consider some scenarios in which applications of association rule mining and data clustering require privacy safeguards. Addressing privacy preservation in such scenarios is complex. One must not only meet privacy requirements but also guarantee valid data mining results. This status indicates the pressing need for rethinking mechanisms to enforce privacy safeguards without losing the benefit of mining. These mechanisms can lead to new privacy control methods to convert a database into a new one in such a way as to preserve the main features of the original database for mining. In particular, we address the problem of transforming a database to be shared into a new one that conceals private information while preserving the general patterns and trends from the original database. To address this challenging problem, we propose a unified framework for privacy-preserving data mining that ensures that the mining process will not violate privacy up to a certain degree of security. The framework encompasses a family of privacy-preserving data transformation methods, a library of algorithms, retrieval facilities to speed up the transformation process, and a set of metrics to evaluate the effectiveness of the proposed algorithms, in terms of information loss, and to quantify how much private information has been disclosed. Our investigation concludes that privacy-preserving data mining is to some extent possible. We demonstrate empirically and theoretically the practicality and feasibility of achieving privacy preservation in data mining. Our experiments reveal that our framework is effective, meets privacy requirements, and guarantees valid data mining results while protecting sensitive information (e.g., sensitive knowledge and individuals' privacy).