Automatic Clustering of Software Systems Using a Genetic Algorithm

  • Authors:
  • D. Doval;S. Mancoridis;B. S. Mitchell

  • Affiliations:
  • -;-;-

  • Venue:
  • STEP '99 Proceedings of the Software Technology and Engineering Practice
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large software systems tend to have a rich, complex structure. This structure is typically represented as a directed graph, which we call a module dependency graph (MDG). In MDGs, source-level components and their inter-relationships are represented as nodes and directed edges, respectively.When proper documentation is not available, software analysis tools can be used to recover MDGs automatically. Usually, even for well-designed systems, the recovered MDGs are large and complex. One way of making complex MDGs more accessible is to partition their nodes so that closely related nodes are grouped into composite nodes called clusters.In this paper we describe a system that augments source code analysis tools by supporting the automatic partitioning of MDGs. In addition, we demonstrate our system's effectiveness by applying it to examples of software systems.Our approach treats clustering as an optimization problem, and uses a Genetic Algorithm (GA) to search the extraordinarily large solution space of all possible partitions of an MDG for a good (possibly optimal) partition. Essential to the GA is a set of supporting algorithms we developed to quantify the quality of a partition using the relations between MDG nodes and clusters of nodes as the primary measuring parameter.