Novel micro-aggregation techniques for secure statistical databases

  • Authors:
  • Ebaa Fayyoumi

  • Affiliations:
  • Carleton University (Canada)

  • Venue:
  • Novel micro-aggregation techniques for secure statistical databases
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A Micro-Aggregation Technique (MAT) is a Statistical Disclosure Control scheme that is used to protect a SDB. The aim of this Doctoral Thesis and our research endeavor is to study the Micro-Aggregation Problem and to design and implement novel MATs that could prevent the disclosure of confidential information, and simultaneously not significantly harm the utility of the data being provided to the user. The research undertaken enhances the general performance of MATs by either minimizing the computation, or by minimizing the value of the Information Loss (IL), the Disclosure Risk (DR), or a composite measure of the latter two indices, the Scoring Index ( SI). This Thesis describes four new methodologies, which are our primary contributions: (1) We have considered an existing MAT algorithm, namely the so-called k-Ward algorithm and optimized it for large data sets. This has been done by taking advantage of the distinct properties of the distance matrix and/or utilizing the principle of recursion. (2) We have merged the rich fields of Learning Automata (LA) and MATs to present a novel Fixed-Structure LA to micro-aggregate a micro-data file. The proposed algorithm has been shown to be superior to the state-of-the-art methods. (3) We have suggested how a neural network philosophy can lead to an enhanced MAT. To achieve this we have investigated the effect of replacing the Euclidean distance, which is used to measure the similarity between the individual records in the micro-data file, by the association and the interaction rules that govern the neural network. (4) We have proposed a methodology to use the theory of causal networks and dependency to improve any MAT. The results of such a preprocessing phase. assist in solving a very difficult problem, namely that of determining the number and identity of the variables to be used in any micro-aggregation process. The Thesis also lists various open problems and avenues for future research.