Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach

  • Authors:
  • Chun Yong Chong;Sai Peck Lee;Teck Chaw Ling

  • Affiliations:
  • Department of Software Engineering, Faculty of Computer Science and IT, University of Malaya, 50603 Lembah Pantai, Kuala Lumpur, Malaysia;Department of Software Engineering, Faculty of Computer Science and IT, University of Malaya, 50603 Lembah Pantai, Kuala Lumpur, Malaysia;Department of Computer System and Technology, Faculty of Computer Science & Information Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia

  • Venue:
  • Information and Software Technology
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Context: Software clustering is a key technique that is used in reverse engineering to recover a high-level abstraction of the software in the case of limited resources. Very limited research has explicitly discussed the problem of finding the optimum set of clusters in the design and how to penalize for the formation of singleton clusters during clustering. Objective: This paper attempts to enhance the existing agglomerative clustering algorithms by introducing a complementary mechanism. To solve the architecture recovery problem, the proposed approach focuses on minimizing redundant effort and penalizing for the formation of singleton clusters during clustering while maintaining the integrity of the results. Method: An automated solution for cutting a dendrogram that is based on least-squares regression is presented in order to find the best cut level. A dendrogram is a tree diagram that shows the taxonomic relationships of clusters of software entities. Moreover, a factor to penalize clusters that will form singletons is introduced in this paper. Simulations were performed on two open-source projects. The proposed approach was compared against the exhaustive and highest gap dendrogram cutting methods, as well as two well-known cluster validity indices, namely, Dunn's index and the Davies-Bouldin index. Results: When comparing our clustering results against the original package diagram, our approach achieved an average accuracy rate of 90.07% from two simulations after the utility classes were removed. The utility classes in the source code affect the accuracy of the software clustering, owing to its omnipresent behavior. The proposed approach also successfully penalized the formation of singleton clusters during clustering. Conclusion: The evaluation indicates that the proposed approach can enhance the quality of the clustering results by guiding software maintainers through the cutting point selection process. The proposed approach can be used as a complementary mechanism to improve the effectiveness of existing clustering algorithms.