Semantic-driven program analysis

  • Authors:
  • Andrian Marcus;Jonathan I. Maletic

  • Affiliations:
  • -;-

  • Venue:
  • Semantic-driven program analysis
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The tasks of maintenance and reengineering of an existing software system require a great deal of effort to be spent on understanding the source code to determine the behavior, organization, and architecture of the software. Different types of information (e.g., static, dynamic, source code, documentation, etc.) will describe different features of the software system. There are at least two key aspects of the system that the user needs to understand: (1) what problem is the software solving and (2) how is the software achieving the solution. Static analysis directly supports software comprehension. Various methods exist to perform static analysis of the software system. Most of the existing methods focus on the structural information embedded in the source code, derived mainly from the programming language syntax (e.g., control and data flow). This type of information assists the user understand how the software works. In order to understand the concepts that the software system is implementing or solving, the user needs to extract and analyze information that describes the concepts in the problem and solution domains of the system under investigation. This type of information is referred to as semantic information. The presented research advocates the use of latent semantic indexing, a vector space model based information retrieval method, to extract the semantic information embedded in the source code and associated documentation. The use of latent semantic indexing to support software analysis is a novel application. This type of analysis is called semantic driven program analysis. A new model to combine structural and semantic information to support program comprehension and analysis is defined. Based on this model, a new measure for cohesion of software is defined and made relevant to related measures via a known unified framework for cohesion measurement. Extensions to this framework are proposed. Empirical results are presented that evaluate the effectiveness of semantic driven analysis in providing support for a number of reverse engineering tasks: recovery of traceability links between documentation and source code, identification clones in software.