kb-anonymity: a model for anonymized behaviour-preserving test and debugging data

Authors:
Aditya Budi;David Lo;Lingxiao Jiang; Lucia
Affiliations:
Singapore Management University, Singapore, Singapore;Singapore Management University, Singapore, Singapore;Singapore Management University, Singapore, Singapore;Singapore Management University, Singapore, Singapore
Venue:
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Year:
2011

Citing 35
Cited 2

Symbolic execution and program testing

Communications of the ACM
Prioritizing Test Cases For Regression Testing

IEEE Transactions on Software Engineering
Isolating cause-effect chains from computer programs

Proceedings of the 10th ACM SIGSOFT symposium on Foundations of software engineering
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Bug isolation via remote program sampling

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
DART: directed automated random testing

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
CUTE: a concolic unit testing engine for C

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Locating faults through automated predicate switching

Proceedings of the 28th international conference on Software engineering
Achieving anonymity via clustering

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Revisiting the uniqueness of simple demographics in the US population

Proceedings of the 5th ACM workshop on Privacy in electronic society
L-diversity: Privacy beyond k-anonymity

ACM Transactions on Knowledge Discovery from Data (TKDD)
Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography

Proceedings of the 16th international conference on World Wide Web
M-invariance: towards privacy preserving re-publication of dynamic datasets

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Scrash: a system for generating secure crash information

SSYM'03 Proceedings of the 12th conference on USENIX Security Symposium - Volume 12
Quantitative information flow as network flow capacity

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Fault localization using value replacement

ISSTA '08 Proceedings of the 2008 international symposium on Software testing and analysis
Differential symbolic execution

Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
Panalyst: privacy-aware remote error analysis on commodity software

SS'08 Proceedings of the 17th conference on Security symposium
Merlin: specification inference for explicit information flow problems

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Lightweight fault-localization using multiple coverage types

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Test-Suite Augmentation for Evolving Software

ASE '08 Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering
JPF-SE: a symbolic execution extension to Java PathFinder

TACAS'07 Proceedings of the 13th international conference on Tools and algorithms for the construction and analysis of systems
Generalized symbolic execution for model checking and testing

TACAS'03 Proceedings of the 9th international conference on Tools and algorithms for the construction and analysis of systems
Liability in software engineering: overview of the LISE approach and illustration on a case study

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
An exploratory study of the evolution of software licensing

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
A machine learning approach for tracing regulatory codes to product specific requirements

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Directed test generation for effective fault localization

Proceedings of the 19th international symposium on Software testing and analysis
Exploiting program dependencies for scalable multiple-path symbolic execution

Proceedings of the 19th international symposium on Software testing and analysis
KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
TaintEraser: protecting sensitive data leaks using application-level taint tracking

ACM SIGOPS Operating Systems Review
Symstra: a framework for generating object-oriented unit tests using symbolic execution

TACAS'05 Proceedings of the 11th international conference on Tools and Algorithms for the Construction and Analysis of Systems
Model checking programs with java pathfinder

SPIN'05 Proceedings of the 12th international conference on Model Checking Software
Language-based information-flow security

IEEE Journal on Selected Areas in Communications

Testing software in age of data privacy: a balancing act

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
kbe-anonymity: test data anonymization for evolving programs

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is often very expensive and practically infeasible to generate test cases that can exercise all possible program states in a program. This is especially true for a medium or large industrial system. In practice, industrial clients of the system often have a set of input data collected either before the system is built or after the deployment of a previous version of the system. Such data are highly valuable as they represent the operations that matter in a client's daily business and may be used to extensively test the system. However, such data often carries sensitive information and cannot be released to third-party development houses. For example, a healthcare provider may have a set of patient records that are strictly confidential and cannot be used by any third party. Simply masking sensitive values alone may not be sufficient, as the correlation among fields in the data can reveal the masked information. Also, masked data may exhibit different behavior in the system and become less useful than the original data for testing and debugging. For the purpose of releasing private data for testing and debugging, this paper proposes the kb-anonymity model, which combines the k-anonymity model commonly used in the data mining and database areas with the concept of program behavior preservation. Like k-anonymity, kb-anonymity replaces some information in the original data to ensure privacy preservation so that the replaced data can be released to third-party developers. Unlike k-anonymity, kb-anonymity ensures that the replaced data exhibits the same kind of program behavior exhibited by the original data so that the replaced data may still be useful for the purposes of testing and debugging. We also provide a concrete version of the model under three particular configurations and have successfully applied our prototype implementation to three open source programs, demonstrating the utility and scalability of our prototype.