On the effectiveness of early life cycle defect prediction with Bayesian Nets

Authors:
Norman Fenton;Martin Neil;William Marsh;Peter Hearty;Łukasz Radliński;Paul Krause
Affiliations:
Department of Computer Science, Queen Mary, University of London, London, UK;Department of Computer Science, Queen Mary, University of London, London, UK;Department of Computer Science, Queen Mary, University of London, London, UK;Department of Computer Science, Queen Mary, University of London, London, UK;Department of Computer Science, Queen Mary, University of London, London, UK and Institute of Information Technology in Management, University of Szczecin, Szczecin, Poland;Department of Computing, University of Surrey, Guildford, UK
Venue:
Empirical Software Engineering
Year:
2008

Citing 20
Cited 13

Prediction and control of ADA software defects

Journal of Systems and Software - An Oregon workshop on software metrics
Bayesian Analysis of Empirical Software Engineering Cost Models

IEEE Transactions on Software Engineering
A Critique of Software Defect Prediction Models

IEEE Transactions on Software Engineering
Introduction to Bayesian Networks

Introduction to Bayesian Networks
Software Metrics: A Rigorous and Practical Approach

Software Metrics: A Rigorous and Practical Approach
Software Measurement: Uncertainty and Causal Modeling

IEEE Software
Using Sensitivity Analysis to Validate a State Variable Model of the Software Test Process

IEEE Transactions on Software Engineering
On the Sensitivity of COCOMO II Software Cost Estimation Model

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
An Empirical Validation of the Relationship Between the Magnitude of Relative Error and Project Size

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Making Resource Decisions for Software Projects

Proceedings of the 26th International Conference on Software Engineering
Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
Learning Bayesian Networks

Learning Bayesian Networks
Predicting software defects in varying development lifecycles using Bayesian nets

Information and Software Technology
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Project Data Incorporating Qualitative Factors for Improved Software Defect Prediction

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Global Sensitivity Analysis of Predictor Models in Software Engineering

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
An Approach to Global Sensitivity Analysis: FAST on COCOMO

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Using Ranked Nodes to Model Qualitative Judgments in Bayesian Networks

IEEE Transactions on Knowledge and Data Engineering
Number of Faults per Line of Code

IEEE Transactions on Software Engineering
Improved decision-making for software managers using Bayesian networks

SEA '07 Proceedings of the 11th IASTED International Conference on Software Engineering and Applications

Practical considerations in deploying AI for defect prediction: a case study within the Turkish telecommunication industry

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Predicting Upgrade Project Defects Based on Enhancement Requirements: An Empirical Study

ICSP '09 Proceedings of the International Conference on Software Process: Trustworthy Software Development Processes
Causal networks for risk and compliance: methodology and application

IBM Journal of Research and Development
Practical considerations in deploying statistical methods for defect prediction: A case study within the Turkish telecommunications industry

Information and Software Technology
Defect cost flow model: a Bayesian network for predicting defect correction effort

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Bayesian reasoning for software testing

Proceedings of the FSE/SDP workshop on Future of software engineering research
A framework for integrated software quality prediction using Bayesian nets

ICCSA'11 Proceedings of the 2011 international conference on Computational science and Its applications - Volume Part V
Applying the Mahalanobis-Taguchi strategy for software defect diagnosis

Automated Software Engineering
Guest editorial: learning to organize testing

Automated Software Engineering
A conceptual Bayesian net model for integrated software quality prediction

Annales UMCS, Informatica
Bug prediction based on fine-grained module histories

Proceedings of the 34th International Conference on Software Engineering
Towards a model to support in silico studies of software evolution

Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Taxonomy of quality metrics for assessing assurance of security correctness

Software Quality Control

Quantified Score

Hi-index	0.00

Visualization

Abstract

Standard practice in building models in software engineering normally involves three steps: collecting domain knowledge (previous results, expert knowledge); building a skeleton of the model based on step 1 including as yet unknown parameters; estimating the model parameters using historical data. Our experience shows that it is extremely difficult to obtain reliable data of the required granularity, or of the required volume with which we could later generalize our conclusions. Therefore, in searching for a method for building a model we cannot consider methods requiring large volumes of data. This paper discusses an experiment to develop a causal model (Bayesian net) for predicting the number of residual defects that are likely to be found during independent testing or operational usage. The approach supports (1) and (2), does not require (3), yet still makes accurate defect predictions (an R 2 of 0.93 between predicted and actual defects). Since our method does not require detailed domain knowledge it can be applied very early in the process life cycle. The model incorporates a set of quantitative and qualitative factors describing a project and its development process, which are inputs to the model. The model variables, as well as the relationships between them, were identified as part of a major collaborative project. A dataset, elicited from 31 completed software projects in the consumer electronics industry, was gathered using a questionnaire distributed to managers of recent projects. We used this dataset to validate the model by analyzing several popular evaluation measures (R 2, measures based on the relative error and Pred). The validation results also confirm the need for using the qualitative factors in the model. The dataset may be of interest to other researchers evaluating models with similar aims. Based on some typical scenarios we demonstrate how the model can be used for better decision support in operational environments. We also performed sensitivity analysis in which we identified the most influential variables on the number of residual defects. This showed that the project size, scale of distributed communication and the project complexity cause the most of variation in number of defects in our model. We make both the dataset and causal model available for research use.