Testing software in age of data privacy: a balancing act
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Heap cloning: Enabling dynamic symbolic execution of java programs
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Privacy and utility for defect prediction: experiments with MORPH
Proceedings of the 34th International Conference on Software Engineering
Proceedings of the 34th International Conference on Software Engineering
kbe-anonymity: test data anonymization for evolving programs
Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Model checking database applications
TACAS'13 Proceedings of the 19th international conference on Tools and Algorithms for the Construction and Analysis of Systems
Data science for software engineering
Proceedings of the 2013 International Conference on Software Engineering
Guided test generation for database applications via synthesized database interactions
ACM Transactions on Software Engineering and Methodology (TOSEM)
An orchestrated survey of methodologies for automated software test case generation
Journal of Systems and Software
Hi-index | 0.00 |
Database-centric applications (DCAs) are common in enterprise computing, and they use nontrivial databases. Testing of DCAs is increasingly outsourced to test centers in order to achieve lower cost and higher quality. When releasing proprietary DCAs, its databases should also be made available to test engineers, so that they can test using real data. Testing with real data is important, since fake data lacks many of the intricate semantic connections among the original data elements. However, different data privacy laws prevent organizations from sharing these data with test centers because databases contain sensitive information. Currently, testing is performed with fake data that often leads to worse code coverage and fewer uncovered bugs, thereby reducing the quality of DCAs and obliterating benefits of test outsourcing. We show that a popular data anonymization algorithm called k-anonymity seriously degrades test coverage of DCAs. We propose an approach that uses program analysis to guide selective application of k-anonymity. This approach helps protect sensitive data in databases while retaining testing efficacy. Our results show that for small values of k = 7, test coverage drops to less than 30% from the original coverage of more than 70%, thus making it difficult to achieve good quality when testing DCAs while applying data privacy.