On the security of individual data

  • Authors:
  • J. Demetrovics;G. O. Katona;D. Miklós

  • Affiliations:
  • Computer and Automation Institute, Hungarian Academy of Science, Budapest, Hungary H-1111;Alfréd Rényi Institute of Mathematics, HAS, Budapest, Hungary H-1364;Alfréd Rényi Institute of Mathematics, HAS, Budapest, Hungary H-1364

  • Venue:
  • Annals of Mathematics and Artificial Intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We will consider the following problem in this paper: Assume that there are $$n$$ numerical data $$\{x_1,x_2,\ldots,x_n\}$$ (like salaries of $$n$$ individuals) stored in a database and some subsums of these numbers are made public or just available for persons not eligible to learn the original data. Our motivating question is: At most how many of these subsums may be disclosed such that none of the numbers $$x_1,x_2,\ldots,x_n$$ can be uniquely determined from these sums. These types of problems arise in the cases when certain tasks concerning a database are done by subcontractors who are not eligible to learn the elements of the database, but naturally should be given some data to fulfill there task. In database theory such examples are called statistical databases as they are used for statistical purposes and no individual data are supposed to be obtained using a restricted list of SUM queries. This problem was originally introduced by [1], originally solved by Miller et al. [7] and revisited by Griggs [4, 5]. It was shown in [7] that no more than $${n\choose n/2}$$ subsums of a given set of secure data may be disclosed without disclosing at least one of the data, which upper bound is sharp as well. To calculate a subsum, it might need some operations whose number is limited. This is why it is natural to assume that the disclosed subsums of the original elements of the database will contain only a limited number of elements, say at most $$k$$ . The goal of the present paper is to determine the maximum number of subsums of size at most $$k$$ which can be disclosed without making possible to calculate any of the individual data $$x_i$$ . The maximum is exactly determined for the case when the number of data is much larger than the size restriction $$k$$ .