Scaling up a hybrid genetic linear programming algorithm for statistical disclosure control

Authors:
Martin C. Serpell;James E. Smith;Alistair R. Clark;Andrea T. Staggemeier
Affiliations:
University of the West of England, Bristol, United Kingdom;University of the West of England, Bristol, United Kingdom;University of the West of England, Bristol, United Kingdom;Office for National Statistics, Newport, United Kingdom
Venue:
Proceedings of the 13th annual conference on Genetic and evolutionary computation
Year:
2011

Citing 8
Cited 1

Fitness inheritance in genetic algorithms

SAC '95 Proceedings of the 1995 ACM symposium on Applied computing
The theory of evolution strategies

The theory of evolution strategies
Numerical Optimization of Computer Models

Numerical Optimization of Computer Models
Network Flows Heuristics for Complementary Cell Suppression: An Empirical Evaluation and Extensions

Inference Control in Statistical Databases, From Theory to Practice
HiTaS: A Heuristic Approach to Cell Suppression in Hierarchical Tables

Inference Control in Statistical Databases, From Theory to Practice
Introduction to Evolutionary Computing

Introduction to Evolutionary Computing
A comprehensive survey of fitness approximation in evolutionary computation

Soft Computing - A Fusion of Foundations, Methodologies and Applications
Self-adaptation of mutation operator and probability for permutation representations in genetic algorithms

Evolutionary Computation

Initial application of ant colony optimisation to statistical disclosure control

Proceedings of the 15th annual conference on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper looks at the real world problem of statistical disclosure control. National Statistics Agencies are required to publish detailed statistics and simultaneously guarantee the confidentiality of the contributors. When published statistical tables contain magnitude data such as turnover or health statistics the preferred method is to suppress the values of cells which may reveal confidential information. However suppressing these 'primary' cells alone will not guarantee protection due the presence of margin (row/column) totals and therefore other 'secondary' cells must also be suppressed. A previously developed algorithm that hybridizes linear programming with a genetic algorithm has been shown to protect tables with up to 40,000 cells, however Statistical Agencies are often required to protect tables with over 100,000 cells. This algorithm's performance highly depended on the choice of mutation operator so firstly this dependency was removed. As the algorithm is unable to protect larger tables due to the time it takes for its fitness function (a linear program) to execute a series of modifications have been applied. These modifications significantly reduced its execution time which in turn greatly extend the capabilities of the hybrid algorithm to the point that it can now protect tables with up to one million cells.