Mining Software Evolution to Predict Refactoring

  • Authors:
  • Jacek Ratzinger;Thomas Sigmund;Peter Vorburger;Harald Gall

  • Affiliations:
  • Vienna University of Technology, Austria;Vienna University of Technology, Austria;University of Zurich, Switzerland;University of Zurich, Switzerland

  • Venue:
  • ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Can we predict locations of future refactoring based on the development history? In an empirical study of open source projects we found that attributes of software evolution data can be used to predict the need for refactoring in the following two months of development. Information systems utilized in software projects provide a broad range of data for decision support. Versioning systems log each activity during the development, which we use to extract data mining features such as growth measures, relationships between classes, the number of authors working on a particular piece of code, etc. We use this information as input into classification algorithms to create prediction models for future refactoring activities. Different state-of-the-art classifiers are investigated such as decision trees, logistic model trees, propositional rule learners, and nearest neighbor algorithms. With both high precision and high recall we can assess the refactoring proneness of object-oriented systems. Although we investigate different domains, we discovered critical factors within the development life cycle leading to refactoring, which are common among all studied projects.