Programmer-based fault prediction

Authors:
Thomas J. Ostrand;Elaine J. Weyuker;Robert M. Bell
Affiliations:
AT&T Labs - Research, Florham Park, NJ;AT&T Labs - Research, Florham Park, NJ;AT&T Labs - Research, Florham Park, NJ
Venue:
Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Year:
2010

Citing 26
Cited 6

A Validation of Object-Oriented Design Metrics as Quality Indicators

IEEE Transactions on Software Engineering
Predicting Fault-Prone Software Modules in Telephone Switches

IEEE Transactions on Software Engineering
Validation of the coupling dependency metric as a predictor of run-time failures and maintenance measures

Proceedings of the 20th international conference on Software engineering
Predicting Fault Incidence Using Software Change History

IEEE Transactions on Software Engineering
An empirical evaluation of fault-proneness models

Proceedings of the 24th International Conference on Software Engineering
Early Quality Prediction: A Case Study in Telecommunications

IEEE Software
Use of relative code churn measures to predict system defect density

Proceedings of the 27th international conference on Software engineering
Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
Mining metrics to predict component failures

Proceedings of the 28th international conference on Software engineering
Predicting fault-prone components in a java legacy system

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Adequate and Precise Evaluation of Quality Models in Software Engineering Studies

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Using Developer Information as a Factor for Fault Prediction

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Predicting defects using network analysis on dependency graphs

Proceedings of the 30th international conference on Software engineering
Comparing design and code metrics for software quality prediction

Proceedings of the 4th international workshop on Predictor models in software engineering
Implications of ceiling effects in defect predictors

Proceedings of the 4th international workshop on Predictor models in software engineering
Using the Conceptual Cohesion of Classes for Fault Prediction in Object-Oriented Systems

IEEE Transactions on Software Engineering
Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

Empirical Software Engineering
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
Can developer-module networks predict failures?

Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
Predicting failures with developer networks and social network analysis

Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
Revisiting the evaluation of defect prediction models

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Does calling structure information improve the accuracy of fault prediction?

MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Comparing the effectiveness of several modeling methods for fault prediction

Empirical Software Engineering
Putting it all together: using socio-technical networks to predict failures

ISSRE'09 Proceedings of the 20th IEEE international conference on software reliability engineering
We're Finding Most of the Bugs, but What are We Missing?

ICST '10 Proceedings of the 2010 Third International Conference on Software Testing, Verification and Validation

Does measuring code change improve fault prediction?

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Nothing else matters: what predictive model should I use?

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Empirical validation of human factors in predicting issue lead time in open source projects

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Studying volatility predictors in open source software

Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
The effect of branching strategies on software quality

Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Influence of confirmation biases of developers on software quality: an empirical study

Software Quality Control

Quantified Score

Hi-index	0.00

Visualization

Abstract

Background: Previous research has provided evidence that a combination of static code metrics and software history metrics can be used to predict with surprising success which files in the next release of a large system will have the largest numbers of defects. In contrast, very little research exists to indicate whether information about individual developers can profitably be used to improve predictions. Aims: We investigate whether files in a large system that are modified by an individual developer consistently contain either more or fewer faults than the average of all files in the system. The goal of the investigation is to determine whether information about which particular developer modified a file is able to improve defect predictions. We also continue an earlier study to evaluate the use of counts of the number of developers who modified a file as predictors of the file's future faultiness. Method: We analyzed change reports filed by 107 programmers for 16 releases of a system with 1,400,000 LOC and 3100 files. A "bug ratio" was defined for programmers, measuring the proportion of faulty files in release R out of all files modified by the programmer in release R-1. The study compares the bug ratios of individual programmers to the average bug ratio, and also assesses the consistency of the bug ratio across releases for individual programmers. Results: Bug ratios varied widely among all the programmers, as well as for many individual programmers across all the releases that they participated in. We found a statistically significant correlation between the bug ratios for programmers for the first half of changed files versus the ratios for the second half, indicating a measurable degree of persistence in the bug ratio. However, when the computation was repeated with the bug ratio controlled not only by release, but also by file size, the correlation disappeared. In addition to the bug ratios, we confirmed that counts of the cumulative number of different developers changing a file over its lifetime can help to improve predictions, while other developer counts are not helpful. Conclusions: The results from this preliminary study indicate that adding information to a model about which particular developer modified a file is not likely to improve defect predictions. The study is limited to a single large system, and its results may not hold more widely. The bug ratio is only one way of measuring the "fault-proneness" of an individual programmer's coding, and we intend to investigate other ways of evaluating bug introduction by individuals.