Is Data Privacy Always Good for Software Testing?

Authors:
Mark Grechanik;Christoph Csallner;Chen Fu;Qing Xie
Affiliations:
-;-;-;-
Venue:
ISSRE '10 Proceedings of the 2010 IEEE 21st International Symposium on Software Reliability Engineering
Year:
2010

Citing 0
Cited 9

Testing software in age of data privacy: a balancing act

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Heap cloning: Enabling dynamic symbolic execution of java programs

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Privacy and utility for defect prediction: experiments with MORPH

Proceedings of the 34th International Conference on Software Engineering
Societal computing

Proceedings of the 34th International Conference on Software Engineering
kbe-anonymity: test data anonymization for evolving programs

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Model checking database applications

TACAS'13 Proceedings of the 19th international conference on Tools and Algorithms for the Construction and Analysis of Systems
Data science for software engineering

Proceedings of the 2013 International Conference on Software Engineering
Guided test generation for database applications via synthesized database interactions

ACM Transactions on Software Engineering and Methodology (TOSEM)
An orchestrated survey of methodologies for automated software test case generation

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Database-centric applications (DCAs) are common in enterprise computing, and they use nontrivial databases. Testing of DCAs is increasingly outsourced to test centers in order to achieve lower cost and higher quality. When releasing proprietary DCAs, its databases should also be made available to test engineers, so that they can test using real data. Testing with real data is important, since fake data lacks many of the intricate semantic connections among the original data elements. However, different data privacy laws prevent organizations from sharing these data with test centers because databases contain sensitive information. Currently, testing is performed with fake data that often leads to worse code coverage and fewer uncovered bugs, thereby reducing the quality of DCAs and obliterating benefits of test outsourcing. We show that a popular data anonymization algorithm called k-anonymity seriously degrades test coverage of DCAs. We propose an approach that uses program analysis to guide selective application of k-anonymity. This approach helps protect sensitive data in databases while retaining testing efficacy. Our results show that for small values of k = 7, test coverage drops to less than 30% from the original coverage of more than 70%, thus making it difficult to achieve good quality when testing DCAs while applying data privacy.