An investigation of the effect of module size on defect prediction using static measures

  • Authors:
  • A. Günes Koru;Hongfang Liu

  • Affiliations:
  • University of Maryland, Baltimore County - UMBC, Baltimore, MD;University of Maryland, Baltimore County - UMBC, Baltimore, MD

  • Venue:
  • PROMISE '05 Proceedings of the 2005 workshop on Predictor models in software engineering
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We used several machine learning algorithms to predict the defective modules in five NASA products, namely, CM1, JM1, KC1, KC2, and PC1. A set of static measures were employed as predictor variables. While doing so, we observed that a large portion of the modules were small, as measured by lines of code (LOC). When we experimented on the data subsets created by partitioning according to module size, we obtained higher prediction performance for the subsets that include larger modules. We also performed defect prediction using class-level data for KC1 rather than the method-level data. In this case, the use of class-level data resulted in improved prediction performance compared to using method-level data. These findings suggest that quality assurance activities can be guided even better if defect prediction is performed by using data that belong to larger modules.