Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach

Authors:
Chun Yong Chong;Sai Peck Lee;Teck Chaw Ling
Affiliations:
Department of Software Engineering, Faculty of Computer Science and IT, University of Malaya, 50603 Lembah Pantai, Kuala Lumpur, Malaysia;Department of Software Engineering, Faculty of Computer Science and IT, University of Malaya, 50603 Lembah Pantai, Kuala Lumpur, Malaysia;Department of Computer System and Technology, Faculty of Computer Science & Information Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia
Venue:
Information and Software Technology
Year:
2013

Citing 26
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Software architecture recovery and restructuring through clustering techniques

ISAW '98 Proceedings of the third international workshop on Software architecture
Identifying objects using cluster and concept analysis

Proceedings of the 21st international conference on Software engineering
Automatic Clustering of Software Systems Using a Genetic Algorithm

STEP '99 Proceedings of the Software Technology and Engineering Practice
Using Clustering Algorithms in Legacy Systems Remodularization

WCRE '97 Proceedings of the Fourth Working Conference on Reverse Engineering (WCRE '97)
Experiments with Clustering as a Software Remodularization Method

WCRE '99 Proceedings of the Sixth Working Conference on Reverse Engineering
Evaluating the Suitability of Data Clustering for Software Remodularization

WCRE '00 Proceedings of the Seventh Working Conference on Reverse Engineering (WCRE'00)
Component Clustering Based on Maximal Association

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Bunch: A Clustering Tool for the Recovery and Maintenance of Software System Structures

ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Using Automatic Clustering to Produce High-Level System Organizations of Source Code

IWPC '98 Proceedings of the 6th International Workshop on Program Comprehension
A Complexity Measure for Ontology Based on UML

FTDCS '04 Proceedings of the 10th IEEE International Workshop on Future Trends of Distributed Computing Systems
Applications of clustering techniques to software partitioning, recovery and restructuring

Journal of Systems and Software - Special issue: Applications of statistics in software engineering
Information-Theoretic Software Clustering

IEEE Transactions on Software Engineering
Software Clustering based on Omnipresent Object Detection

IWPC '05 Proceedings of the 13th International Workshop on Program Comprehension
Spectral and meta-heuristic algorithms for software clustering

Journal of Systems and Software - Special issue: Software reverse engineering
Comparison of Clustering Algorithms in the Context of Software Evolution

ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
On the Automatic Modularization of Software Systems Using the Bunch Tool

IEEE Transactions on Software Engineering
Program restructuring using clustering techniques

Journal of Systems and Software - Special issue: Selected papers from the 4th source code analysis and manipulation (SCAM 2004) workshop
Optimal implementations of UPGMA and other common clustering algorithms

Information Processing Letters
Hierarchical Clustering for Software Architecture Recovery

IEEE Transactions on Software Engineering
Software Clustering Using Dynamic Analysis and Static Dependencies

CSMR '09 Proceedings of the 2009 European Conference on Software Maintenance and Reengineering
k-Adic Similarity Coefficients for Binary (Presence/Absence) Data

Journal of Classification
Quality of the Source Code for Design and Architecture Recovery Techniques: Utilities are the Problem

QSIC '09 Proceedings of the 2009 Ninth International Conference on Quality Software
Software Module Clustering as a Multi-Objective Search Problem

IEEE Transactions on Software Engineering
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence
Identification and application of Extract Class refactorings in object-oriented systems

Journal of Systems and Software

Clustering Software Components for Component Reuse and Program Restructuring

Proceedings of the Second International Conference on Innovative Computing and Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Context: Software clustering is a key technique that is used in reverse engineering to recover a high-level abstraction of the software in the case of limited resources. Very limited research has explicitly discussed the problem of finding the optimum set of clusters in the design and how to penalize for the formation of singleton clusters during clustering. Objective: This paper attempts to enhance the existing agglomerative clustering algorithms by introducing a complementary mechanism. To solve the architecture recovery problem, the proposed approach focuses on minimizing redundant effort and penalizing for the formation of singleton clusters during clustering while maintaining the integrity of the results. Method: An automated solution for cutting a dendrogram that is based on least-squares regression is presented in order to find the best cut level. A dendrogram is a tree diagram that shows the taxonomic relationships of clusters of software entities. Moreover, a factor to penalize clusters that will form singletons is introduced in this paper. Simulations were performed on two open-source projects. The proposed approach was compared against the exhaustive and highest gap dendrogram cutting methods, as well as two well-known cluster validity indices, namely, Dunn's index and the Davies-Bouldin index. Results: When comparing our clustering results against the original package diagram, our approach achieved an average accuracy rate of 90.07% from two simulations after the utility classes were removed. The utility classes in the source code affect the accuracy of the software clustering, owing to its omnipresent behavior. The proposed approach also successfully penalized the formation of singleton clusters during clustering. Conclusion: The evaluation indicates that the proposed approach can enhance the quality of the clustering results by guiding software maintainers through the cutting point selection process. The proposed approach can be used as a complementary mechanism to improve the effectiveness of existing clustering algorithms.