Clustering source code files to predict change propagation during software maintenance

Authors:
Megan Bailey;King-Ip Lin;Linda Sherrell
Affiliations:
University of Memphis, Memphis, TN;University of Memphis, Memphis, TN;University of Memphis, Memphis, TN
Venue:
Proceedings of the 50th Annual Southeast Regional Conference
Year:
2012

Citing 10
Cited 1

Decline and fall of the American programmer

Decline and fall of the American programmer
Extracting concepts from file names: a new file clustering criterion

Proceedings of the 20th international conference on Software engineering
Mining the Maintenance History of a Legacy Software System

ICSM '03 Proceedings of the International Conference on Software Maintenance
Mining Version Histories to Guide Software Changes

Proceedings of the 26th International Conference on Software Engineering
Predicting Source Code Changes by Mining Change History

IEEE Transactions on Software Engineering
Predicting Change Propagation in Software Systems

ICSM '04 Proceedings of the 20th IEEE International Conference on Software Maintenance
Improving change prediction with fine-grained source code mining

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
A survey and taxonomy of approaches for mining software repositories in the context of software evolution

Journal of Software Maintenance and Evolution: Research and Practice
Recommending change clusters to support software investigation: an empirical study

Journal of Software Maintenance and Evolution: Research and Practice - Working Conference on Reverse Engineering (WCRE 2008)
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Automated parameter estimation process for clustering algorithms used in software maintenance

Proceedings of the 51st ACM Southeast Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the question that is frequently considered by software developers performing maintenance tasks on large systems: "If I make a change in this file, are there other files that need to change too?" If a development tool could automatically answer this question, then time and money could be saved during software maintenance. The proposed solution follows trends from past research results in using data mining techniques and information extracted from the CVS change repository. We define a distance measure using both the revision history of files and text-based information, cluster change sets of files, and then calculate a membership value of each file to each cluster to create groupings of files that are likely to be changed together in the future. Our approach predicts files that may need to be modified based on these clusters, and we evaluate these predictions on portions of the open-source Eclipse project.