Clustering and Metrics Thresholds Based Software Fault Prediction of Unlabeled Program Modules

  • Authors:
  • Cagatay Catal;Ugur Sevim;Banu Diri

  • Affiliations:
  • -;-;-

  • Venue:
  • ITNG '09 Proceedings of the 2009 Sixth International Conference on Information Technology: New Generations
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Predicting the fault-proneness of program modules when the fault labels for modules are unavailable is a practical problem frequently encountered in the software industry. Because fault data belonging to previous software version is not available, supervised learning approaches can not be applied, leading to the need for new methods, tools, or techniques. In this study, we propose a clustering and metrics thresholds based software fault prediction approach for this challenging problem and explore it on three datasets, collected from a Turkish white-goods manufacturer developing embedded controller software. Experiments reveal that unsupervised software fault prediction can be automated and reasonable results can be produced with techniques based on metrics thresholds and clustering. The results of this study demonstrate the effectiveness of metrics thresholds and show that the standalone application of metrics thresholds (one-stage) is currently easier than the clustering and metrics thresholds based (two-stage) approach because the selection of cluster number is performed heuristically in this clustering based method.