Using latent semantic analysis to identify similarities in source code to support program understanding

  • Authors:
  • Affiliations:
  • Venue:
  • ICTAI '00 Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: The paper describes the results of applying Latent Semantic Analysis (LSA), an advanced information retrieval method, to program source code and associated documentation. Latent semantic analysis is a corpus based statistical method for inducing and representing aspects of the meanings of words and passages (of natural language) reflective in their usage. This methodology is assessed for application to the domain of software components (i.e., source code and its accompanying documentation). Here LSA is used as the basis to cluster software components. This clustering is used to assist in the understanding of a nontrivial software system, namely a version of Mosaic. Applying latent semantic analysis to the domain of source code and internal documentation for the support of program understanding is a new application of this method and a departure from the normal application domain of natural language.