Clustering source code files to predict change propagation during software maintenance

  • Authors:
  • Megan Bailey;King-Ip Lin;Linda Sherrell

  • Affiliations:
  • University of Memphis, Memphis, TN;University of Memphis, Memphis, TN;University of Memphis, Memphis, TN

  • Venue:
  • Proceedings of the 50th Annual Southeast Regional Conference
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses the question that is frequently considered by software developers performing maintenance tasks on large systems: "If I make a change in this file, are there other files that need to change too?" If a development tool could automatically answer this question, then time and money could be saved during software maintenance. The proposed solution follows trends from past research results in using data mining techniques and information extracted from the CVS change repository. We define a distance measure using both the revision history of files and text-based information, cluster change sets of files, and then calculate a membership value of each file to each cluster to create groupings of files that are likely to be changed together in the future. Our approach predicts files that may need to be modified based on these clusters, and we evaluate these predictions on portions of the open-source Eclipse project.