On the complexity of the l-diversity problem

  • Authors:
  • Riccardo Dondi;Giancarlo Mauri;Italo Zoppis

  • Affiliations:
  • Dipartimento di Scienze dei Linguaggi, Università degli Studi di Bergamo, Bergamo, Italy;DISCo, Università degli Studi di Milano-Bicocca, Milano, Italy;DISCo, Università degli Studi di Milano-Bicocca, Milano, Italy

  • Venue:
  • MFCS'11 Proceedings of the 36th international conference on Mathematical foundations of computer science
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of publishing personal data without giving up privacy is becoming increasingly important. Different interesting formalizations have been recently proposed in this context, i.e. k-anonymity [17,18] and l-diversity [12]. These approaches require that the rows in a table are clustered in sets satisfying some constraint, in order to prevent the identification of the individuals the rows belong to. In this paper we focus on the l-diversity problem, where the possible attributes are distinguished in sensible attributes and quasi-identifier attributes. The goal is to partition the set of rows, where for each set C of the partition it is required that the number of rows having a specific value in the sensible attribute is at most 1/l |C|. We investigate the approximation and parameterized complexity of l-diversity. Concerning the approximation complexity, we prove the following results: (1) the problem is not approximable within factor c ln l, for some constant c 0, even if the input table consists of two columns; (ii) the problem is APX-hard, even if l = 4 and the input table contains exactly 3 columns; (iii) the problem admits an approximation algorithm of factor m (where m + 1 is the number of columns in the input table), when the sensitive attribute ranges over an alphabet of constant size. Concerning the parameterized complexity, we prove the following results: (i) the problem is W[1]-hard even if parameterized by the size of the solution, l, and the size of the alphabet; (ii) the problem admits a fixed-parameter algorithm when both the maximum number of different values in a column and the number of columns are parameters.