A computational algorithm for handling the special uniques problem

  • Authors:
  • M. J. Elliot;A. M. Manning;R. W. Ford

  • Affiliations:
  • Cathie Marsh Center for Census and Survey Research (CCSR), Manchester University, M13 9PL, UK;Department of Computer Science, Manchester University, M13 9PL, UK;Department of Computer Science, Manchester University, M13 9PL, UK

  • Venue:
  • International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many organizations require detailed individual-level information, much of which has been collected under guarantees of confidentiality. However, simple anonymization procedures, i.e. removing names and addresses, are insufficient for this to be ensured. The records belonging to certain individuals have a high probability of being identified (as their contents, or attributes, are unusual) and therefore have the potential to be recognized spontaneously - such records are referred to as special uniques. Consider, for example, a sixteen-year-old widow in a population survey. Confidentiality of a given dataset cannot be enabled until all special unique records are identified and either disguised or removed. However, to the knowledge of the authors, no exhaustive automated analysis of this nature has been conducted due to the demanding levels of computation and data storage that are required. This paper introduces a new algorithm that locates 'risky' records in discrete data by first identifying all unique attribute sets (up to a user-specified maximum size) and secondly by grading the 'risk' of each record by considering the number and distribution of unique attribute sets within each record. Empirical tests indicate that the algorithm is highly effective at picking out 'risky' records from large samples of data.