Statistical database modeling for privacy preserving database generation

Authors:
Xintao Wu;Yongge Wang;Yuliang Zheng
Affiliations:
UNC Charlotte;UNC Charlotte;UNC Charlotte
Venue:
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Year:
2005

Citing 3
Cited 3

A framework for testing database applications

Proceedings of the 2000 ACM SIGSOFT international symposium on Software testing and analysis
MUDD: a multi-dimensional data generator

WOSP '04 Proceedings of the 4th international workshop on Software and performance
Privacy preserving database application testing

Proceedings of the 2003 ACM workshop on Privacy in the electronic society

Towards value disclosure analysis in modeling general databases

Proceedings of the 2006 ACM symposium on Applied computing
Privacy Preserving Database Generation for Database Application Testing

Fundamenta Informaticae - Special issue ISMIS'05
Privacy Preserving Database Generation for Database Application Testing

Fundamenta Informaticae - Special issue ISMIS'05

Quantified Score

Hi-index	0.01

Visualization

Abstract

Testing of database applications is of great importance. Although various studies have been conducted to investigate testing techniques for database design, relatively few efforts have been made to explicitly address the testing of database applications which requires a large amount of representative data available. As testing over live production databases is often infeasible in many situations due to the high risks of disclosure of confidential information or incorrect updating of real data, in this paper we investigate the problem of generating synthetic database based on a-priori knowledge about production database. Our approach is to fit general location model using various characteristics (e.g., constraints, statistics, rules) extracted from production database and then generate synthetic data using model learnt. As characteristics extracted may contain information which may be used by attacker to derive some confidential information, we present a disclosure analysis method which is based on cell suppression technique. Our method is effective and efficient to remove aggregate private information during data generation.