Information-Theoretic Software Clustering

Authors:
Periklis Andritsos;Vassilios Tzerpos
Affiliations:
IEEE Computer Society;IEEE Computer Society
Venue:
IEEE Transactions on Software Engineering
Year:
2005

Citing 28
Cited 28

System Structure Analysis: Clustering with Data Bindings

IEEE Transactions on Software Engineering - Annals of discrete mathematics, 24
Cross references are features

SCM '89 Proceedings of the 2nd International Workshop on Software configuration management
Elements of information theory

Elements of information theory
Assessing modular structure of legacy code based on mathematical concept analysis

ICSE '97 Proceedings of the 19th international conference on Software engineering
An intelligent tool for re-engineering software modularity

ICSE '91 Proceedings of the 13th international conference on Software engineering
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Linux as a case study: its extracted software architecture

Proceedings of the 21st international conference on Software engineering
Recovering software architecture from the names of source files

Journal of Software Maintenance: Research and Practice
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Extracting and Restructuring the Design of Large Systems

IEEE Software
Identifying modules via concept analysis

ICSM '97 Proceedings of the International Conference on Software Maintenance
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Recovering High-Level Structure of Software Systems Using a Minimum Description Length Principle

AICS '02 Proceedings of the 13th Irish International Conference on Artificial Intelligence and Cognitive Science
File clustering using naming conventions for legacy systems

CASCON '97 Proceedings of the 1997 conference of the Centre for Advanced Studies on Collaborative research
Software architecture recovery using Conway's law

CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
MoJo: A Distance Metric for Software Clusterings

WCRE '99 Proceedings of the Sixth Working Conference on Reverse Engineering
Experiments with Clustering as a Software Remodularization Method

WCRE '99 Proceedings of the Sixth Working Conference on Reverse Engineering
ACDC: An Algorithm for Comprehension-Driven Clustering

WCRE '00 Proceedings of the Seventh Working Conference on Reverse Engineering (WCRE'00)
Comparing the Decompositions Produced by Software Clustering Algorithms using Similarity Measurements

ICSM '01 Proceedings of the IEEE International Conference on Software Maintenance (ICSM'01)
A Graph Pattern Matching Approach to Software Architecture Recovery

ICSM '01 Proceedings of the IEEE International Conference on Software Maintenance (ICSM'01)
Bunch: A Clustering Tool for the Recovery and Maintenance of Software System Structures

ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
A Framework for Experimental Evaluation of Clustering Techniques

IWPC '00 Proceedings of the 8th International Workshop on Program Comprehension
An Optimal Algorithm for MoJo Distance

IWPC '03 Proceedings of the 11th IEEE International Workshop on Program Comprehension
Reconstructing Ownership Architectures To Help Understand Software Systems

IWPC '99 Proceedings of the 7th International Workshop on Program Comprehension
Reverse Engineering Meets Data Analysis

IWPC '01 Proceedings of the 9th International Workshop on Program Comprehension
Architecture-Aware Adaptive Clustering of OO Systems

CSMR '04 Proceedings of the Eighth Euromicro Working Conference on Software Maintenance and Reengineering (CSMR'04)
Software Clustering Based on Dynamic Dependencies

CSMR '05 Proceedings of the Ninth European Conference on Software Maintenance and Reengineering

Clustering large software systems at multiple layers

Information and Software Technology
API-Based and Information-Theoretic Metrics for Measuring the Quality of Software Modularization

IEEE Transactions on Software Engineering
Hierarchical Clustering for Software Architecture Recovery

IEEE Transactions on Software Engineering
Computing dynamic clusters

Proceedings of the 2nd India software engineering conference
Clustering for Monitoring Software Systems Maintainability Evolution

Electronic Notes in Theoretical Computer Science (ENTCS)
Does Requirements Clustering Lead to Modular Design?

REFSQ '09 Proceedings of the 15th International Working Conference on Requirements Engineering: Foundation for Software Quality
Discovery of architectural layers and measurement of layering violations in source code

Journal of Systems and Software
Reverse-engineering of an industrial software using the unified process: an experiment

SEA '07 Proceedings of the 11th IASTED International Conference on Software Engineering and Applications
Automatic generation of abstract views for legacy software comprehension

Proceedings of the 3rd India software engineering conference
A desiderata for refactoring-based software modularity improvement

Proceedings of the 3rd India software engineering conference
Visual comparison of software architectures

Proceedings of the 5th international symposium on Software visualization
A biting-down approach to hierarchical decomposition of object-oriented systems based on structure analysis

Journal of Software Maintenance and Evolution: Research and Practice
Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems

Information and Software Technology
Studying software evolution using artefacts' shared information content

Science of Computer Programming
On the congruence of modularity and code coupling

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Controlling software architecture erosion: A survey

Journal of Systems and Software
Vector space based on hierarchical weighting: a component ranking approach to component retrieval

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Modified adaptive resonance theory network for mixed data based on distance hierarchy

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
Enhancing architectural recovery using concerns

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Clustering methodologies for software engineering

Advances in Software Engineering
Recovering design patterns to support program comprehension

Proceedings of the 2nd international workshop on Evidential assessment of software technologies
Social and algorithmic issues in business support: SAIBS 2012

ACM SIGSOFT Software Engineering Notes
Leveraging design rules to improve software architecture recovery

Proceedings of the 9th international ACM Sigsoft conference on Quality of software architectures
Obtaining ground-truth software architectures

Proceedings of the 2013 International Conference on Software Engineering
Cooperative clustering for software modularization

Journal of Systems and Software
Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach

Information and Software Technology
Software re-engineering using imperialist competitive algorithm

ACM SIGSOFT Software Engineering Notes
Clustering Software Components for Component Reuse and Program Restructuring

Proceedings of the Second International Conference on Innovative Computing and Cloud Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

The majority of the algorithms in the software clustering literature utilize structural information to decompose large software systems. Approaches using other attributes, such as file names or ownership information, have also demonstrated merit. At the same time, existing algorithms commonly deem all attributes of the software artifacts being clustered as equally important, a rather simplistic assumption. Moreover, no method that can assess the usefulness of a particular attribute for clustering purposes has been presented in the literature. In this paper, we present an approach that applies information theoretic techniques in the context of software clustering. Our approach allows for weighting schemes that reflect the importance of various attributes to be applied. We introduce LIMBO, a scalable hierarchical clustering algorithm based on the minimization of information loss when clustering a software system. We also present a method that can assess the usefulness of any nonstructural attribute in a software clustering context. We applied LIMBO to three large software systems in a number of experiments. The results indicate that this approach produces clusterings that come close to decompositions prepared by system experts. Experimental results were also used to validate our usefulness assessment method. Finally, we experimented with well-established weighting schemes from information retrieval, web search, and data clustering. We report results as to which weighting schemes show merit in the decomposition of software systems.