Predicting Faults in High Assurance Software

  • Authors:
  • Naeem Seliya;Taghi M. Khoshgoftaar;Jason Van Hulse

  • Affiliations:
  • -;-;-

  • Venue:
  • HASE '10 Proceedings of the 2010 IEEE 12th International Symposium on High-Assurance Systems Engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reducing the number of latent software defects is a development goal that is particularly applicable to high assurance software systems. For such systems, the software measurement and defect data is highly skewed toward the not-fault-prone program modules, i.e., the number of fault-prone modules is relatively very small. The skewed data problem, also known as class imbalance, poses a unique challenge when training a software quality estimation model. However, practitioners and researchers often build defect prediction models without regard to the skewed data problem. In high assurance systems, the class imbalance problem must be addressed when building defect predictors. This study investigates the roughly balanced bagging (RBBag) algorithm for building software quality models with data sets that suffer from class imbalance. The algorithm combines bagging and data sampling into one technique. A case study of 15 software measurement data sets from different real-world high assurance systems is used in our investigation of the RBBag algorithm. Two commonly used classification algorithms in the software engineering domain, Na篓ıve Bayes and C4.5 decision tree, are combined with RBBag for building the software quality models. The results demonstrate that defect prediction models based on the RBBag algorithm significantly outperform models built without any bagging or data sampling. The RBBag algorithm provides the analyst with a tool for effectively addressing class imbalance when training defect predictors during high assurance software development.