Is non-parametric hypothesis testing model robust for statistical fault localization?

  • Authors:
  • Zhenyu Zhang;W. K. Chan;T. H. Tse;Peifeng Hu;Xinming Wang

  • Affiliations:
  • Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong;Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Hong Kong;Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong;China Merchants Bank, Central, Hong Kong;Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong

  • Venue:
  • Information and Software Technology
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fault localization is one of the most difficult activities in software debugging. Many existing statistical fault-localization techniques estimate the fault positions of programs by comparing the program feature spectra between passed runs and failed runs. Some existing approaches develop estimation formulas based on mean values of the underlying program feature spectra and their distributions alike. Our previous work advocates the use of a non-parametric approach in estimation formulas to pinpoint fault-relevant positions. It is worthy of further study to resolve the two schools of thought by examining the fundamental, underlying properties of distributions related to fault localization. In particular, we ask: Can the feature spectra of program elements be safely considered as normal distributions so that parametric techniques can be soundly and powerfully applied? In this paper, we empirically investigate this question from the program predicate perspective. We conduct an experimental study based on the Siemens suite of programs. We examine the degree of normality on the distributions of evaluation biases of the predicates, and obtain three major results from the study. First, almost all examined distributions of evaluation biases are either normal or far from normal, but not in between. Second, the most fault-relevant predicates are less likely to exhibit normal distributions in terms of evaluation biases than other predicates. Our results show that normality is not common as far as evaluation bias can represent. Furthermore, the effectiveness of our non-parametric predicate-based fault-localization technique weakly correlates with the distributions of evaluation biases, making the technique robust to this type of uncertainty in the underlying program spectra.