Evaluation of Several Nonparametric Bootstrap Methods to Estimate Confidence Intervals for Software Metrics

Authors:
Skylar Lei;Michael R. Smith
Affiliations:
-;-
Venue:
IEEE Transactions on Software Engineering
Year:
2003

Citing 4
Cited 1

Bootstrap methods in computer simulation experiments

WSC '95 Proceedings of the 27th conference on Winter simulation
Software metrics (2nd ed.): a rigorous and practical approach

Software metrics (2nd ed.): a rigorous and practical approach
Performance Assessment Through Bootstrap

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Experiment Measuring the Effects of Personal Software Process (PSP) Training

IEEE Transactions on Software Engineering

Investigation of domain effects on software

Proceedings of the 47th Annual Southeast Regional Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sample statistics and model parameters can be used to infer the properties, or characteristics, of the underlying population in typical data-analytic situations. Confidence intervals can provide an estimate of the range within which the true value of the statistic lies. A narrow confidence interval implies low variability of the statistic, justifying a strong conclusion made from the analysis. Many statistics used in software metrics analysis do not come with theoretical formulas to allow such accuracy assessment. The Efron bootstrap statistical analysis appears to address this weakness. In this paper, we present an empirical analysis of the reliability of several Efron nonparametric bootstrap methods in assessing the accuracy of sample statistics in the context of software metrics. A brief review on the basic concept of various methods available for the estimation of statistical errors is provided, with the stated advantages of the Efron bootstrap discussed. Validations of several different bootstrap algorithms are performed across basic software metrics in both simulated and industrial software engineering contexts. It was found that the 90 percent confidence intervals for mean, median, and Spearman correlation coefficients were accurately predicted. The 90 percent confidence intervals for the variance and Pearson correlation coefficients were typically underestimated (60-70 percent confidence interval), and those for skewness and kurtosis overestimated (98-100 percent confidence interval). It was found that the Bias-corrected and accelerated bootstrap approach gave the most consistent confidence intervals, but its accuracy depended on the metric examined. A method for correcting the under-/overestimation of bootstrap confidence intervals for small data sets is suggested, but the success of the approach was found to be inconsistent across the tested metrics.