A Comprehensive Evaluation of Capture-Recapture Models for Estimating Software Defect Content

Authors:
Lionel C. Briand;Khaled El Emam;Bernd G. Freimut;Oliver Laitenberger
Affiliations:
Carleton Univ., Ottawa, Ont., Canada;National Research Council of Canada, Ottawa, Ont., Canada;Fraunhofer Experimental Software Engineering, Kaiserslautern-Siegelbach, Germany;Fraunhofer Experimental Software Engineering, Kaiserslautern-Siegelbach, Germany
Venue:
IEEE Transactions on Software Engineering
Year:
2000

Citing 18
Cited 26

Software reliability: measurement, prediction, application

Software reliability: measurement, prediction, application
Modern mathematical statistics

Modern mathematical statistics
A Two-Person Inspection Method to Improve Programming Productivity

IEEE Transactions on Software Engineering
Experience with Fagan's inspection method

Software—Practice & Experience
Estimating software fault content before coding

ICSE '92 Proceedings of the 14th international conference on Software engineering
Assessing Software Designs Using Capture-Recapture Methods

IEEE Transactions on Software Engineering - Special issue on software reliability
Software inspection process

Software inspection process
On the Statistical Analysis of the Number of Errors Remaining in a Software Design Document after Inspection

IEEE Transactions on Software Engineering
Understanding the sources of variation in software inspections

ACM Transactions on Software Engineering and Methodology (TOSEM)
Using simulation to build inspection efficiency benchmarks for development projects

Proceedings of the 20th international conference on Software engineering
Defect content estimations from review data

Proceedings of the 20th international conference on Software engineering
The application of subjective estimates of effectiveness to controlling software inspections

Journal of Systems and Software - Special issue on software maintenance
Software Inspection

Software Inspection
An Experimental Evaluation of an Experience-Based Capture-RecaptureMethod in Software Code Inspections

Empirical Software Engineering
Software defect-removal efficiency

Computer
Software Inspections: An Effective Verification Process

IEEE Software
Lessons from Three Years of Inspection Data

IEEE Software
A Comparison and Integration of Capture-Recapture Models and the Detection Profile Method

ISSRE '98 Proceedings of the The Ninth International Symposium on Software Reliability Engineering

Evaluating the accuracy of defect estimation models based on inspection data from two inspection cycles

ICSE '01 Proceedings of the 23rd International Conference on Software Engineering
Investigating the cost-effectiveness of reinspections in software development

ICSE '01 Proceedings of the 23rd International Conference on Software Engineering
Evaluating Capture-Recapture Models with Two Inspectors

IEEE Transactions on Software Engineering
Software Cost Estimation with Incomplete Data

IEEE Transactions on Software Engineering
On the many ways software engineering can benefit from knowledge engineering

SEKE '02 Proceedings of the 14th international conference on Software engineering and knowledge engineering
Empirical interval estimates for the defect content after an inspection

Proceedings of the 24th International Conference on Software Engineering
Using a Reliability Growth Model to Control Software Inspection

Empirical Software Engineering
Applying Machine Learning to Solve an Estimation Problem in Software Inspections

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Evaluating defect estimation models with major defects

Journal of Systems and Software
Investigating the Defect Detection Effectiveness and Cost Benefit of Nominal Inspection Teams

IEEE Transactions on Software Engineering
Using Machine Learning for Estimating the Defect Content After an Inspection

IEEE Transactions on Software Engineering
Team-Based Fault Content Estimation in the Software Inspection Process

Proceedings of the 26th International Conference on Software Engineering
A Cognitive-Based Mechanism for Constructing Software Inspection Teams

IEEE Transactions on Software Engineering
Software Defect Association Mining and Defect Correction Effort Prediction

IEEE Transactions on Software Engineering
Trace anomalies as precursors of field failures: an empirical study

Empirical Software Engineering
Fishing for phishes: applying capture-recapture methods to estimate phishing populations

Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit
The effect of the number of inspectors on the defect estimates produced by capture-recapture models

Proceedings of the 30th international conference on Software engineering
Evaluating the cost reduction of static code analysis for software security

Proceedings of the third ACM SIGPLAN workshop on Programming languages and analysis for security
Capture-recapture in software unit testing: a case study

Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
Evaluation of capture-recapture models for estimating the abundance of naturally-occurring defects

Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
Heterogeneity in the usability evaluation process

BCS-HCI '08 Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction - Volume 1
Support planning and controlling of early quality assurance by combining expert judgment and defect data--a case study

Empirical Software Engineering
Sample size in usability studies

Communications of the ACM
Application of kusumoto cost-metric to evaluate the cost effectiveness of software inspections

Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
More testers - The effect of crowd size and time restriction in software testing

Information and Software Technology
Prediction of faults-slip-through in large software projects: an empirical evaluation

Software Quality Control

Quantified Score

Hi-index	0.02

Visualization

Abstract

An important requirement to control the inspection of software artifacts is to be able to decide, based on more objective information, whether the inspection can stop or whether it should continue to achieve a suitable level of artifact quality. A prediction of the number of remaining defects in an inspected artifact can be used for decision making. Several studies in software engineering have considered capture-recapture models, originally proposed by biologists to estimate animal populations, to make a prediction. However, few studies compare the actual number of remaining defects to the one predicted by a capture-recapture model on real software engineering artifacts. Thus, there is little work looking at the robustness of capture-recapture models under realistic software engineering conditions, where it is expected that some of their assumptions will be violated. Simulations have been performed, but no definite conclusions can be drawn regarding the degree of accuracy of such models under realistic inspection conditions and the factors affecting this accuracy. Furthermore, the existing studies focused on a subset of the existing capture-recapture models. Thus, a more exhaustive comparison is still missing. In this study, we focus on traditional inspections and estimate, based on actual inspections data, the degree of accuracy of relevant, state-of-the-art capture-recapture models as they have been proposed in biology and for which statistical estimators exist. In order to assess their robustness, we look at the impact of the number of inspectors and the number of actual defects on the estimators' accuracy based on actual inspection data. Our results show that models are strongly affected by the number of inspectors and, therefore, one must consider this factor before using capture-recapture models. When the number of inspectors is too small, no model is sufficiently accurate and underestimation may be substantial. In addition, some models perform better than others in a large number of conditions and plausible reasons are discussed. Based on our analyses, we recommend using a model taking into account that defects have different probabilities of being detected and the corresponding Jackknife Estimator. Furthermore, we attempt to calibrate the prediction models based on their relative error, as previously computed on other inspections. Although intuitive and straightforward, we identified theoretical limitations to this approach which were then confirmed by the data.