Replication of defect prediction studies: problems, pitfalls and recommendations

Authors:
Thilo Mende
Affiliations:
University of Bremen, Germany
Venue:
Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Year:
2010

Citing 21
Cited 6

Predicting Fault-Prone Software Modules in Telephone Switches

IEEE Transactions on Software Engineering
The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics

IEEE Transactions on Software Engineering
Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
Mining metrics to predict component failures

Proceedings of the 28th international conference on Software engineering
ROCR: visualizing classifier performance in R

Bioinformatics
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Predicting Defects for Eclipse

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"

IEEE Transactions on Software Engineering
Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"

IEEE Transactions on Software Engineering
Data Mining Techniques for Building Fault-proneness Models in Telecom Java Software

ISSRE '07 Proceedings of the The 18th IEEE International Symposium on Software Reliability
Predicting defects using network analysis on dependency graphs

Proceedings of the 30th international conference on Software engineering
Theory of relative defect proneness

Empirical Software Engineering
Techniques for evaluating fault prediction models

Empirical Software Engineering
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
Why comparative effort prediction studies may be invalid

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Validation of network measures as indicators of defective modules in software systems

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Revisiting the evaluation of defect prediction models

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
How to build repeatable experiments

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Predicting faults using the complexity of code changes

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Defect prediction from static code features: current results, limitations, new approaches

Automated Software Engineering
Effort-Aware Defect Prediction Models

CSMR '10 Proceedings of the 2010 14th European Conference on Software Maintenance and Reengineering

Coding-error based defects in enterprise resource planning software: Prevention, discovery, elimination and mitigation

Journal of Systems and Software
Evaluating defect prediction approaches: a benchmark and an extensive comparison

Empirical Software Engineering
Mining input sanitization patterns for predicting SQL injection and cross site scripting vulnerabilities

Proceedings of the 34th International Conference on Software Engineering
Recalling the "imprecision" of cross-project defect prediction

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
How, and why, process metrics are better

Proceedings of the 2013 International Conference on Software Engineering
Predicting SQL injection and cross site scripting vulnerabilities through mining input sanitization patterns

Information and Software Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Background: The main goal of the PROMISE repository is to enable reproducible, and thus verifiable or refutable research. Over time, plenty of data sets became available, especially for defect prediction problems. Aims: In this study, we investigate possible problems and pitfalls that occur during replication. This information can be used for future replication studies, and serve as a guideline for researchers reporting novel results. Method: We replicate two recent defect prediction studies comparing different data sets and learning algorithms, and report missing information and problems. Results: Even with access to the original data sets, replicating previous studies may not lead to the exact same results. The choice of evaluation procedures, performance measures and presentation has a large influence on the reproducibility. Additionally, we show that trivial and random models can be used to identify overly optimistic evaluation measures. Conclusions: The best way to conduct easily reproducible studies is to share all associated artifacts, e.g. scripts and programs used. When this is not an option, our results can be used to simplify the replication task for other researchers.