Privacy Preserving Database Generation for Database Application Testing

  • Authors:
  • Xintao Wu;Yongge Wang;Songtao Guo;Yuliang Zheng

  • Affiliations:
  • Department of Computer Science, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC 28223-0001, USA. E-mail: xwu@uncc.edu;Department of Software and Information System, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC 28223-0001, USA. E-mail: yonwang@uncc.edu/ sguo@uncc.edu/ yzheng@ ...;Department of Software and Information System, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC 28223-0001, USA. E-mail: yonwang@uncc.edu/ sguo@uncc.edu/ yzheng@ ...;Department of Software and Information System, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC 28223-0001, USA. E-mail: yonwang@uncc.edu/ sguo@uncc.edu/ yzheng@ ...

  • Venue:
  • Fundamenta Informaticae - Special issue ISMIS'05
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Testing of database applications is of great importance. Although various studies have been conducted to investigate testing techniques for database design, relatively few efforts have been made to explicitly address the testing of database applications which requires a large amount of representative data available. As testing over live production databases is often infeasible in many situations due to the high risks of disclosure of confidential information or incorrect updating of real data, in this paper we investigate the problem of generating synthetic databases based on a-priori knowledge about production databases. Our approach is to fit the general location model using various characteristics (e.g., constraints, statistics, rules) extracted from a production database and then generate synthetic data using model learned. The generated data is valid and similar to real data in terms of statistical distribution, hence it can be used for functional and performance testing. As characteristics extracted may contain information which may be used by attackers to derive some confidential information about individuals, we present our disclosure analysis method which applies cell suppression technique for identity disclosure and perturbation for value disclosure analysis.