Comparing the effectiveness of several modeling methods for fault prediction

Authors:
Elaine J. Weyuker;Thomas J. Ostrand;Robert M. Bell
Affiliations:
AT&T Labs - Research, Florham Park, USA 07932;AT&T Labs - Research, Florham Park, USA 07932;AT&T Labs - Research, Florham Park, USA 07932
Venue:
Empirical Software Engineering
Year:
2010

Citing 22
Cited 10

Software errors and complexity: an empirical investigation0

Communications of the ACM
The Detection of Fault-Prone Programs

IEEE Transactions on Software Engineering
Predicting Fault-Prone Software Modules in Telephone Switches

IEEE Transactions on Software Engineering
Does Code Decay? Assessing the Evidence from Change Management Data

IEEE Transactions on Software Engineering
The distribution of faults in a large industrial software system

ISSTA '02 Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis
Random Forests

Machine Learning
An empirical evaluation of fault-proneness models

Proceedings of the 24th International Conference on Software Engineering
Early Quality Prediction: A Case Study in Telecommunications

IEEE Software
Reexamining the Fault Density-Component Size Connection

IEEE Software
Quantitative Analysis of Faults and Failures in a Complex Software System

IEEE Transactions on Software Engineering
Practical assessment of the models for identification of defect-prone classes in object-oriented commercial systems using design metrics

Journal of Systems and Software
An Empirical Analysis of Fault Persistence Through Software Releases

ISESE '03 Proceedings of the 2003 International Symposium on Empirical Software Engineering
Robust Prediction of Fault-Proneness by Random Forests

ISSRE '04 Proceedings of the 15th International Symposium on Software Reliability Engineering
Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
An investigation of the effect of module size on defect prediction using static measures

PROMISE '05 Proceedings of the 2005 workshop on Predictor models in software engineering
Looking for bugs in all the right places

Proceedings of the 2006 international symposium on Software testing and analysis
Predicting fault-prone components in a java legacy system

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Automating algorithms for the identification of fault-prone files

Proceedings of the 2007 international symposium on Software testing and analysis
How to measure success of fault prediction models

Fourth international workshop on Software quality assurance: in conjunction with the 6th ESEC/FSE joint meeting
Techniques for evaluating fault prediction models

Empirical Software Engineering
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
Optimizing preventive service of software products

IBM Journal of Research and Development

Software fault prediction tool

Proceedings of the 19th international symposium on Software testing and analysis
Programmer-based fault prediction

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Does measuring code change improve fault prediction?

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
An investigation on the feasibility of cross-project defect prediction

Automated Software Engineering
On the use of calling structure information to improve fault prediction

Empirical Software Engineering
A learning-to-rank algorithm for constructing defect prediction models

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Ensemble approaches for regression: A survey

ACM Computing Surveys (CSUR)
Can file level characteristics help identify system level fault-proneness?

HVC'11 Proceedings of the 7th international Haifa Verification conference on Hardware and Software: verification and testing
Training data selection for cross-project defect prediction

Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Prediction of faults-slip-through in large software projects: an empirical evaluation

Software Quality Control

Quantified Score

Hi-index	0.00

Visualization

Abstract

We compare the effectiveness of four modeling methods--negative binomial regression, recursive partitioning, random forests and Bayesian additive regression trees--for predicting the files likely to contain the most faults for 28 to 35 releases of three large industrial software systems. Predictor variables included lines of code, file age, faults in the previous release, changes in the previous two releases, and programming language. To compare the effectiveness of the different models, we use two metrics--the percent of faults contained in the top 20% of files identified by the model, and a new, more general metric, the fault-percentile-average. The negative binomial regression and random forests models performed significantly better than recursive partitioning and Bayesian additive regression trees, as assessed by either of the metrics. For each of the three systems, the negative binomial and random forests models identified 20% of the files in each release that contained an average of 76% to 94% of the faults.