Studying software evolution using artefacts' shared information content

  • Authors:
  • Tom Arbuckle

  • Affiliations:
  • -

  • Venue:
  • Science of Computer Programming
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In order to study software evolution, it is necessary to measure artefacts representative of project releases. If we consider the process of software evolution to be copying with subsequent modification, then, by analogy, placing emphasis on what remains the same between releases will lead to focusing on similarity between artefacts. At the same time, software artefacts-stored digitally as binary strings-are all information. This paper introduces a new method for measuring software evolution in terms of artefacts' shared information content. A similarity value representing the quantity of information shared between artefact pairs is produced using a calculation based on Kolmogorov complexity. Similarity values for releases are then collated over the software's evolution to form a map quantifying change through lack of similarity. The method has general applicability: it can disregard otherwise salient software features such as programming paradigm, language or application domain because it considers software artefacts purely in terms of the mathematically justified concept of information content. Three open-source projects are analysed to show the method's utility. Preliminary experiments on udev and git verify the measurement of the projects' evolutions. An experiment on ArgoUML validates the measured evolution against experimental data from other studies.