On the number and nature of faults found by random testing

  • Authors:
  • I. Ciupa;A. Pretschner;M. Oriol;A. Leitner;B. Meyer

  • Affiliations:
  • Department of Computer Science, ETH Zürich, Switzerland;Fraunhofer IESE and TU Kaiserslautern, Germany;Department of Computer Science, University of York, U.K.;(Work carried out while this author worked at ETH Zurich.) Google Zurich, Switzerland;Department of Computer Science, ETH Zürich, Switzerland

  • Venue:
  • Software Testing, Verification & Reliability
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Intuition suggests that random testing should exhibit a considerable difference in the number of faults detected by two different runs of equal duration. As a consequence, random testing would be rather unpredictable. This article first evaluates the variance over time of the number of faults detected by randomly testing object-oriented software that is equipped with contracts. It presents the results of an empirical study based on 1215 h of randomly testing 27 Eiffel classes, each with 30 seeds of the random number generator. The analysis of over 6 million failures triggered during the experiments shows that the relative number of faults detected by random testing over time is predictable, but that different runs of the random test case generator detect different faults. The experiment also suggests that the random testing quickly finds faults: the first failure is likely to be triggered within 30 s. The second part of this article evaluates the nature of the faults found by random testing. To this end, it first explains a fault classification scheme, which is also used to compare the faults found through random testing with those found through manual testing and with those found in field use of the software and recorded in user incident reports. The results of the comparisons show that each technique is good at uncovering different kinds of faults. None of the techniques subsumes any of the others; each brings distinct contributions. This supports a more general conclusion on comparisons between testing strategies: the number of detected faults is too coarse a criterion for such comparisons—the nature of faults must also be considered. Copyright © 2009 John Wiley & Sons, Ltd.