Feed-forward and recurrent neural networks for source code informal information analysis

  • Authors:
  • Ettore Merlo;Ian McAdam;Renato De Mori

  • Affiliations:
  • Département de Génie Informatique, Ecole Polytechnique de Montréal, C.P. 6079, Succ. Centre Ville, Montréal, Québec, Canada H3C 3A7;Calidris, Lagamula 5, 108 Reykjavik, Iceland;Laboratoire d'Informatique, Centre d'Enseignement et de Recherche en Informatique, Université d'Avignon, 339 chemin des Meinajariés, BP 1228, 84911 Avignon Cedex 9, France

  • Venue:
  • Journal of Software Maintenance: Research and Practice
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Design recovery, which is a part of the reverse engineering process of source code, must supply programmers with all the information they need to fully understand a program or a system. In this paper, a connectionist method that can be used for design recovery in conjunction with more traditional approaches is proposed for analyzing the informal information (comments and mnemonics) in programs An approach based on artificial neural networks (ANNs) was chosen because of its property of being robust (capable of tolerating noisy inputs), because of its associative memory ability (capable of retrieving a concept given only the context of the input word that originally fired the concept), and because of its generalization power (ability to learn conceptually relevant micro-features of the domain). The proposed approach uses a combination of top down domain analysis (i.e., the creation of a concept hierarchy by a domain expert, to be used in the construction of the training set) and a bottom up approach (i.e., the analysis of the informal information using ANNs).A preprocessing system that extracts the relevant comments and identifier names and transforms them into an input for the ANNs has been developed. Feed-forward neural networks (FNNs) and recurrent neural networks (RNNs) were tried. RNN architectures are capable of learning sequences and are able to make use of the word ordering of the sentence. The networks were trained on part of the source code of an existing system and tested on a different portion of the system code. Test results, consisting of coverage and evaluation figures, are presented. They show a remarkably higher accuracy when ANNs, in general, are used as opposed to simple lexical methods. RNNs, in particular, also show higher coverage and accuracy than FNNs.