An Empirical Exploration of the Distributions of the Chidamber and Kemerer Object-Oriented Metrics Suite

  • Authors:
  • Giancarlo Succi;Witold Pedrycz;Snezana Djokic;Paolo Zuliani;Barbara Russo

  • Affiliations:
  • Center for Applied Software Engineering, Free University of Bolzano-Bozen, Bolzano-Bozen, Italy giancarlo.succi@unibz.it;Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada pedrycz@ee.ualberta.ca;Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada snezana@ee.ualberta.ca;Center for Applied Software Engineering, Free University of Bolzano-Bozen, Bolzano-Bozen, Italy paolo.zuliani@unibz.it;Center for Applied Software Engineering, Free University of Bolzano-Bozen, Bolzano-Bozen, Italy barbara.russo@unibz.it

  • Venue:
  • Empirical Software Engineering
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The object-oriented metrics suite proposed by Chidamber and Kemerer (CK) is a measurement approach towards improved object-oriented design and development practices. However, existing studies evidence traces of collinearity between some of the metrics and low ranges of other metrics, two facts which may endanger the validity of models based on the CK suite. As high correlation may be an indicator of collinearity, in this paper, we empirically determine to what extent high correlations and low ranges might be expected among CK metrics.To draw as much general conclusions as possible, we extract the CK metrics from a large data set (200 public domain projects) and we apply statistical meta-analysis techniques to strengthen the validity of our results. Homogenously through the projects, we found a moderate (∼0.50) to high correlation (0.80) between some of the metrics and low ranges of other metrics.Results of this empirical analysis supply researchers and practitioners with three main advises: a) to avoid the use in prediction systems of CK metrics that have correlation more than 0.80 b) to test for collinearity those metrics that present moderate correlations (between 0.50 and 0.60) c) to avoid the use as response in continuous parametric regression analysis of the metrics presenting low variance. This might therefore suggest that a prediction system may not be based on the whole CK metrics suite, but only on a subset consisting of those metrics that do not present either high correlation or low ranges.