You can't control the unfamiliar: A study on the relations between aggregation techniques for software metrics

  • Authors:
  • Bogdan Vasilescu;Alexander Serebrenik;Mark van den Brand

  • Affiliations:
  • Technische Universiteit Eindhoven, Den Dolech 2, P.O. Box 513, 5600 MB, The Netherlands;Technische Universiteit Eindhoven, Den Dolech 2, P.O. Box 513, 5600 MB, The Netherlands;Technische Universiteit Eindhoven, Den Dolech 2, P.O. Box 513, 5600 MB, The Netherlands

  • Venue:
  • ICSM '11 Proceedings of the 2011 27th IEEE International Conference on Software Maintenance
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A popular approach to assessing software maintainability and predicting its evolution involves collecting and analyzing software metrics. However, metrics are usually defined on a micro-level (method, class, package), and should therefore be aggregated in order to provide insights in the evolution at the macro-level (system). In addition to traditional aggregation techniques such as the mean, median, or sum, recently econometric aggregation techniques, such as the Gini, Theil, Kolm, Atkinson, and Hoover inequality indices have been proposed and applied to software metrics. In this paper we present the results of an extensive correlation study of the most widely-used traditional and econometric aggregation techniques, applied to lifting SLOC values from class to package level in the 106 systems comprising the Qualitas Corpus. Moreover, we investigate the nature of this relation, and study its evolution on a subset of 12 systems from the Qualitas Corpus. Our results indicate high and statistically significant correlation between the Gini, Theil, Atkinson, and Hoover indices, i.e., aggregation values obtained using these techniques convey the same information. However, we discuss some of the rationale behind choosing between one index or another.