Revisiting the evaluation of defect prediction models

Authors:
Thilo Mende;Rainer Koschke
Affiliations:
University of Bremen, Germany;University of Bremen, Germany
Venue:
PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Year:
2009

Citing 24
Cited 9

Bagging predictors

Machine Learning
Predicting Fault-Prone Software Modules in Telephone Switches

IEEE Transactions on Software Engineering
Comparing case-based reasoning classifiers for predicting high risk software components

Journal of Systems and Software
Elements of Software Science (Operating and programming systems series)

Elements of Software Science (Operating and programming systems series)
Random Forests

Machine Learning
Ordering Fault-Prone Software Modules

Software Quality Control
Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
ROCR: visualizing classifier performance in R

Bioinformatics
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Adequate and Precise Evaluation of Quality Models in Software Engineering Studies

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Automating algorithms for the identification of fault-prone files

Proceedings of the 2007 international symposium on Software testing and analysis
A Complexity Measure

IEEE Transactions on Software Engineering
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"

IEEE Transactions on Software Engineering
Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"

IEEE Transactions on Software Engineering
Data Mining Techniques for Building Fault-proneness Models in Telecom Java Software

ISSRE '07 Proceedings of the The 18th IEEE International Symposium on Software Reliability
Comparing design and code metrics for software quality prediction

Proceedings of the 4th international workshop on Predictor models in software engineering
Can data transformation help in the detection of fault-prone modules?

DEFECTS '08 Proceedings of the 2008 workshop on Defects in large software systems
Techniques for evaluating fault prediction models

Empirical Software Engineering
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
Cost Curve Evaluation of Fault Prediction Models

ISSRE '08 Proceedings of the 2008 19th International Symposium on Software Reliability Engineering
Evaluating Defect Prediction Models for a Large Evolving Software System

CSMR '09 Proceedings of the 2009 European Conference on Software Maintenance and Reengineering

Replication of defect prediction studies: problems, pitfalls and recommendations

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Programmer-based fault prediction

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Review: Software fault prediction: A literature review and current trends

Expert Systems with Applications: An International Journal
Pragmatic prioritization of software quality assurance efforts

Proceedings of the 33rd International Conference on Software Engineering
High-impact defects: a study of breakage and surprise defects

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Evaluating defect prediction approaches: a benchmark and an extensive comparison

Empirical Software Engineering
A study of subgroup discovery approaches for defect prediction

Information and Software Technology
Is lines of code a good measure of effort in effort-aware models?

Information and Software Technology
An in-depth study of the potentially confounding effect of class size in fault prediction

ACM Transactions on Software Engineering and Methodology (TOSEM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Defect Prediction Models aim at identifying error-prone parts of a software system as early as possible. Many such models have been proposed, their evaluation, however, is still an open question, as recent publications show. An important aspect often ignored during evaluation is the effort reduction gained by using such models. Models are usually evaluated per module by performance measures used in information retrieval, such as recall, precision, or the area under the ROC curve (AUC). These measures assume that the costs associated with additional quality assurance activities are the same for each module, which is not reasonable in practice. For example, costs for unit testing and code reviews are roughly proportional to the size of a module. In this paper, we investigate this discrepancy using optimal and trivial models. We describe a trivial model that takes only the module size measured in lines of code into account, and compare it to five classification methods. The trivial model performs surprisingly well when evaluated using AUC. However, when an effort-sensitive performance measure is used, it becomes apparent that the trivial model is in fact the worst.