Understanding and Controlling Software Costs
IEEE Transactions on Software Engineering
Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
What We Have Learned About Fighting Defects
METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
State-of-the-art in privacy preserving data mining
ACM SIGMOD Record
Data Mining
L-diversity: Privacy beyond k-anonymity
ACM Transactions on Knowledge Discovery from Data (TKDD)
Data Mining Static Code Attributes to Learn Defect Predictors
IEEE Transactions on Software Engineering
Cross versus Within-Company Cost Estimation Studies: A Systematic Review
IEEE Transactions on Software Engineering
Workload-aware anonymization techniques for large-scale datasets
ACM Transactions on Database Systems (TODS)
The cost of privacy: destruction of data-mining utility in anonymized data publishing
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Empirical Software Engineering
IEEE Transactions on Software Engineering
PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
On the relative value of cross-company and within-company data for defect prediction
Empirical Software Engineering
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Privacy-preserving data publishing: A survey of recent developments
ACM Computing Surveys (CSUR)
Hybrid microdata using microaggregation
Information Sciences: an International Journal
Approximate algorithms with generalizing attribute values for k-anonymity
Information Systems
When to use data from other projects for effort estimation
Proceedings of the IEEE/ACM international conference on Automated software engineering
Is Data Privacy Always Good for Software Testing?
ISSRE '10 Proceedings of the 2010 IEEE 21st International Symposium on Software Reliability Engineering
Using Faults-Slip-Through Metric as a Predictor of Fault-Proneness
APSEC '10 Proceedings of the 2010 Asia Pacific Software Engineering Conference
Camouflage: automated anonymization of field data
Proceedings of the 33rd International Conference on Software Engineering
How to Find Relevant Data for Effort Estimation?
ESEM '11 Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement
Data science for software engineering
Proceedings of the 2013 International Conference on Software Engineering
Beyond data mining; towards "idea engineering"
Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Hi-index | 0.00 |
Ideally, we can learn lessons from software projects across multiple organizations. However, a major impediment to such knowledge sharing are the privacy concerns of software development organizations. This paper aims to provide defect data-set owners with an effective means of privatizing their data prior to release. We explore MORPH which understands how to maintain class boundaries in a data-set. MORPH is a data mutator that moves the data a random distance, taking care not to cross class boundaries. The value of training on this MORPHed data is tested via a 10-way within learning study and a cross learning study using Random Forests, Naive Bayes, and Logistic Regression for ten object-oriented defect data-sets from the PROMISE data repository. Measured in terms of exposure of sensitive attributes, the MORPHed data was four times more private than the unMORPHed data. Also, in terms of the f-measures, there was little difference between the MORPHed and unMORPHed data (original data and data privatized by data-swapping) for both the cross and within study. We conclude that at least for the kinds of OO defect data studied in this project, data can be privatized without concerns for inference efficacy.