Comparison of Clustering Algorithms in the Context of Software Evolution

Authors:
Jingwei Wu;Ahmed E. Hassan;Richard C. Holt
Affiliations:
University of Waterloo;University of Waterloo;University of Waterloo
Venue:
ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
Year:
2005

Citing 0
Cited 9

Hierarchical Clustering for Software Architecture Recovery

IEEE Transactions on Software Engineering
Splitting a large software repository for easing future software evolution—an industrial experience report

Journal of Software Maintenance and Evolution: Research and Practice - Special Issue on the 12th Conference on Software Maintenance and Reengineering (CSMR 2008)
A biting-down approach to hierarchical decomposition of object-oriented systems based on structure analysis

Journal of Software Maintenance and Evolution: Research and Practice
Applying a dynamic threshold to improve cluster detection of LSI

Science of Computer Programming
Clustering methodologies for software engineering

Advances in Software Engineering
Leveraging design rules to improve software architecture recovery

Proceedings of the 9th international ACM Sigsoft conference on Quality of software architectures
Obtaining ground-truth software architectures

Proceedings of the 2013 International Conference on Software Engineering
Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach

Information and Software Technology
Improving software modularization via automated analysis of latent topics and dependencies

ACM Transactions on Software Engineering and Methodology (TOSEM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

To aid software analysis and maintenance tasks, a number of software clustering algorithms have been proposed to automatically partition a software system into meaningful subsystems or clusters. However, it is unknown whether these algorithms produce similar meaningful clusterings for similar versions of a real-life software system under continual change and growth. This paper describes a comparative study of six software clustering algorithms. We applied each of the algorithms to subsequent versions from five large open source systems. We conducted comparisons based on three criteria respectively: stability (Does the clustering change only modestly as the system undergoes modest updating?), authoritativeness (Does the clustering reasonably approximate the structure an authority provides?) and extremity of cluster distribution (Does the clustering avoid huge clusters and many very small clusters?). Experimental results indicate that the studied algorithms exhibit distinct characteristics. For example, the clusterings from the most stable algorithm bear little similarity to the implemented system structure, while the clusterings from the least stable algorithm has the best cluster distribution. Based on obtained results, we claim that current automatic clustering algorithms need significant improvement to provide continual support for large software projects.