Privacy: A Machine Learning View

  • Authors:
  • Staal A. Vinterbo

  • Affiliations:
  • -

  • Venue:
  • IEEE Transactions on Knowledge and Data Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of disseminating a data set for machine learning while controlling the disclosure of data source identity is described using a commuting diagram of functions. This formalization is used to present and analyze an optimization problem balancing privacy and data utility requirements. The analysis points to the application of a generalization mechanism for maintaining privacy in view of machine learning needs. We present new proofs of NP-hardness of the problem of minimizing information loss while satisfying a set of privacy requirements, both with and without the addition of a particular uniform coding requirement. As an initial analysis of the approximation properties of the problem, we show that the cell suppression problem with a constant number of attributes can be approximated within a constant. As a side effect, proofs of NP-hardness of the minimum k{\hbox{-}}{\rm{union}}, maximum k{\hbox{-}}{\rm{intersection}}, and parallel versions of these are presented. Bounded versions of these problems are also shown to be approximable within a constant.