Formal analysis of the effectiveness and predictability of random testing
Proceedings of the 19th international symposium on Software testing and analysis
A baseline method for search-based software engineering
Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Challenges to support automated random testing for dynamically typed languages
Proceedings of the International Workshop on Smalltalk Technologies
Stateful testing: Finding more errors in code and contracts
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
The search for the laws of automatic random testing
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
Intuition suggests that random testing should exhibit a considerable difference in the number of faults detected by two different runs of equal duration. As a consequence, random testing would be rather unpredictable. This article first evaluates the variance over time of the number of faults detected by randomly testing object-oriented software that is equipped with contracts. It presents the results of an empirical study based on 1215 h of randomly testing 27 Eiffel classes, each with 30 seeds of the random number generator. The analysis of over 6 million failures triggered during the experiments shows that the relative number of faults detected by random testing over time is predictable, but that different runs of the random test case generator detect different faults. The experiment also suggests that the random testing quickly finds faults: the first failure is likely to be triggered within 30 s. The second part of this article evaluates the nature of the faults found by random testing. To this end, it first explains a fault classification scheme, which is also used to compare the faults found through random testing with those found through manual testing and with those found in field use of the software and recorded in user incident reports. The results of the comparisons show that each technique is good at uncovering different kinds of faults. None of the techniques subsumes any of the others; each brings distinct contributions. This supports a more general conclusion on comparisons between testing strategies: the number of detected faults is too coarse a criterion for such comparisons—the nature of faults must also be considered. Copyright © 2009 John Wiley & Sons, Ltd.