Software errors and complexity: an empirical investigation0
Communications of the ACM
The Detection of Fault-Prone Programs
IEEE Transactions on Software Engineering
Predicting Fault-Prone Software Modules in Telephone Switches
IEEE Transactions on Software Engineering
Does Code Decay? Assessing the Evidence from Change Management Data
IEEE Transactions on Software Engineering
The distribution of faults in a large industrial software system
ISSTA '02 Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis
Machine Learning
An empirical evaluation of fault-proneness models
Proceedings of the 24th International Conference on Software Engineering
Reexamining the Fault Density-Component Size Connection
IEEE Software
Quantitative Analysis of Faults and Failures in a Complex Software System
IEEE Transactions on Software Engineering
An Empirical Analysis of Fault Persistence Through Software Releases
ISESE '03 Proceedings of the 2003 International Symposium on Empirical Software Engineering
Robust Prediction of Fault-Proneness by Random Forests
ISSRE '04 Proceedings of the 15th International Symposium on Software Reliability Engineering
Predicting the Location and Number of Faults in Large Software Systems
IEEE Transactions on Software Engineering
An investigation of the effect of module size on defect prediction using static measures
PROMISE '05 Proceedings of the 2005 workshop on Predictor models in software engineering
Looking for bugs in all the right places
Proceedings of the 2006 international symposium on Software testing and analysis
Predicting fault-prone components in a java legacy system
Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Automating algorithms for the identification of fault-prone files
Proceedings of the 2007 international symposium on Software testing and analysis
How to measure success of fault prediction models
Fourth international workshop on Software quality assurance: in conjunction with the 6th ESEC/FSE joint meeting
Techniques for evaluating fault prediction models
Empirical Software Engineering
IEEE Transactions on Software Engineering
Optimizing preventive service of software products
IBM Journal of Research and Development
Software fault prediction tool
Proceedings of the 19th international symposium on Software testing and analysis
Programmer-based fault prediction
Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Does measuring code change improve fault prediction?
Proceedings of the 7th International Conference on Predictive Models in Software Engineering
An investigation on the feasibility of cross-project defect prediction
Automated Software Engineering
On the use of calling structure information to improve fault prediction
Empirical Software Engineering
A learning-to-rank algorithm for constructing defect prediction models
IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Ensemble approaches for regression: A survey
ACM Computing Surveys (CSUR)
Can file level characteristics help identify system level fault-proneness?
HVC'11 Proceedings of the 7th international Haifa Verification conference on Hardware and Software: verification and testing
Training data selection for cross-project defect prediction
Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Prediction of faults-slip-through in large software projects: an empirical evaluation
Software Quality Control
Hi-index | 0.00 |
We compare the effectiveness of four modeling methods--negative binomial regression, recursive partitioning, random forests and Bayesian additive regression trees--for predicting the files likely to contain the most faults for 28 to 35 releases of three large industrial software systems. Predictor variables included lines of code, file age, faults in the previous release, changes in the previous two releases, and programming language. To compare the effectiveness of the different models, we use two metrics--the percent of faults contained in the top 20% of files identified by the model, and a new, more general metric, the fault-percentile-average. The negative binomial regression and random forests models performed significantly better than recursive partitioning and Bayesian additive regression trees, as assessed by either of the metrics. For each of the three systems, the negative binomial and random forests models identified 20% of the files in each release that contained an average of 76% to 94% of the faults.