On the statistical properties of testing effectiveness measures

Authors:
Tsong Yueh Chen;Fei-Ching Kuo;Robert Merkel
Affiliations:
Faculty of Information and Communication Technologies, Swinburne University of Technology, John Street, Hawthorn 3122, Australia;Faculty of Information and Communication Technologies, Swinburne University of Technology, John Street, Hawthorn 3122, Australia;Faculty of Information and Communication Technologies, Swinburne University of Technology, John Street, Hawthorn 3122, Australia
Venue:
Journal of Systems and Software - Special issue: Quality software
Year:
2006

Citing 8
Cited 9

Partition Testing Does Not Inspire Confidence (Program Testing)

IEEE Transactions on Software Engineering
Analyzing Partition Testing Strategies

IEEE Transactions on Software Engineering
On the Expected Number of Failures Detected by Subdomain Testing and Random Testing

IEEE Transactions on Software Engineering
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator

ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on uniform random number generation
Partition Testing vs. Random Testing: The Influence of Uncertainty

IEEE Transactions on Software Engineering
On the analytical comparison of testing techniques

ISSTA '04 Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
In-process metrics for software testing

IBM Systems Journal
Adaptive random testing

ASIAN'04 Proceedings of the 9th Asian Computing Science conference on Advances in Computer Science: dedicated to Jean-Louis Lassez on the Occasion of His 5th Cycle Birthday

Adaptive random testing with randomly translated failure region

Proceedings of the 1st international workshop on Random testing
An empirical analysis and comparison of random testing techniques

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Enhancing adaptive random testing in high dimensional input domains

Proceedings of the 2007 ACM symposium on Applied computing
An upper bound on software testing effectiveness

ACM Transactions on Software Engineering and Methodology (TOSEM)
Enhancing adaptive random testing for programs with high dimensional input domains or failure-unrelated parameters

Software Quality Control
Formal analysis of the effectiveness and predictability of random testing

Proceedings of the 19th international symposium on Software testing and analysis
Comparison of adaptive random testing and random testing under various testing and debugging scenarios

Software—Practice & Experience
An orchestrated survey of methodologies for automated software test case generation

Journal of Systems and Software
Automated cookie collection testing

ACM Transactions on Software Engineering and Methodology (TOSEM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We examine the statistical variability of three commonly used software testing effectiveness measures-the E-measure (expected number of failures detected), P-measure (probability of detecting at least one failure), and F-measure (number of tests required to detect the first failure). We show that for random testing with replacement, the F-measure will be distributed according to the geometric distribution. A simulation study examines the distribution of two adaptive random testing methods, to investigate how closely their sampling distributions approximate the geometric distribution. One key observation is that in the worst case scenario, the sampling distribution of adaptive random testing is very similar to that of random testing. The E-measure and P-measure have a normal sampling distribution, but high variability, meaning that large sample sizes are required to obtain results with satisfactorily narrow confidence intervals. We illustrate this with a simulation study for the P-measure. Our results have reinforced, from a perspective other than empirical analysis, that adaptive random testing is a more effective alternative to random testing, with reference to the F-measure. We consider the implications of our findings for previous studies conducted in the area, and make recommendations to future studies.