Feed-forward and recurrent neural networks for source code informal information analysis

Authors:
Ettore Merlo;Ian McAdam;Renato De Mori
Affiliations:
Département de Génie Informatique, Ecole Polytechnique de Montréal, C.P. 6079, Succ. Centre Ville, Montréal, Québec, Canada H3C 3A7;Calidris, Lagamula 5, 108 Reykjavik, Iceland;Laboratoire d'Informatique, Centre d'Enseignement et de Recherche en Informatique, Université d'Avignon, 339 chemin des Meinajariés, BP 1228, 84911 Avignon Cedex 9, France
Venue:
Journal of Software Maintenance: Research and Practice
Year:
2003

Citing 27
Cited 4

The self-extending phrasal lexicon

Computational Linguistics - Special issue of the lexicon
Design Recovery for Maintenance and Reuse

Computer
SRE: a knowledge-based environment for large-scale software re-engineering activities

ICSE '89 Proceedings of the 11th international conference on Software engineering
Implementing faceted classification for software reuse

Communications of the ACM - Special issue on software engineering
A general framework for parallel distributed processing

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
An Information Retrieval Approach for Automatically Constructing Software Libraries

IEEE Transactions on Software Engineering
Computation of term associations by a neural network

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
The concept assignment problem in program understanding

ICSE '93 Proceedings of the 15th international conference on Software Engineering
Extracting concepts from file names: a new file clustering criterion

Proceedings of the 20th international conference on Software engineering
Recovering software architecture from the names of source files

Journal of Software Maintenance: Research and Practice
Reverse Engineering and Design Recovery: A Taxonomy

IEEE Software
Extracting and Restructuring the Design of Large Systems

IEEE Software
Recognizing a Program's Design: A Graph-Parsing Approach

IEEE Software
Guest Editor's Introduction to the Special Issue on Neutral Network Software Systems

IEEE Transactions on Software Engineering
Reuse of Modular Software with Automated Comment Analysis

ICSM '94 Proceedings of the International Conference on Software Maintenance
File clustering using naming conventions for legacy systems

CASCON '97 Proceedings of the 1997 conference of the Centre for Advanced Studies on Collaborative research
Assessing the relevance of identifier names in a legacy software system

CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Nomen Est Omen: Analyzing the Language of Function Identifiers

WCRE '99 Proceedings of the Sixth Working Conference on Reverse Engineering
Tracing Object-Oriented Code into Functional Requirements

IWPC '00 Proceedings of the 8th International Workshop on Program Comprehension
A Theory of Networks for Approximation and Learning

A Theory of Networks for Approximation and Learning
Experiences in program understanding

CASCON '92 Proceedings of the 1992 conference of the Centre for Advanced Studies on Collaborative research - Volume 1
An approach to program understanding by natural language understanding

Natural Language Engineering
Automatic semantic classification of verbs from their syntactic contexts: an implemented classifier for stativity

EACL '91 Proceedings of the fifth conference on European chapter of the Association for Computational Linguistics
Automatically extracting and representing collocations for language generation

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Requirements validation via automated natural language parsing

Journal of Management Information Systems - Special section: Information technology and its organizational impact

An empirical study of the relationship between the concepts expressed in source code and dependence

Journal of Systems and Software
An empirical analysis of information retrieval based concept location techniques in software comprehension

Empirical Software Engineering
Automatically capturing source code context of NL-queries for software maintenance and reuse

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Normalizing source code vocabulary to support program comprehension and software quality

Proceedings of the 2013 International Conference on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Design recovery, which is a part of the reverse engineering process of source code, must supply programmers with all the information they need to fully understand a program or a system. In this paper, a connectionist method that can be used for design recovery in conjunction with more traditional approaches is proposed for analyzing the informal information (comments and mnemonics) in programs An approach based on artificial neural networks (ANNs) was chosen because of its property of being robust (capable of tolerating noisy inputs), because of its associative memory ability (capable of retrieving a concept given only the context of the input word that originally fired the concept), and because of its generalization power (ability to learn conceptually relevant micro-features of the domain). The proposed approach uses a combination of top down domain analysis (i.e., the creation of a concept hierarchy by a domain expert, to be used in the construction of the training set) and a bottom up approach (i.e., the analysis of the informal information using ANNs).A preprocessing system that extracts the relevant comments and identifier names and transforms them into an input for the ANNs has been developed. Feed-forward neural networks (FNNs) and recurrent neural networks (RNNs) were tried. RNN architectures are capable of learning sequences and are able to make use of the word ordering of the sentence. The networks were trained on part of the source code of an existing system and tested on a different portion of the system code. Test results, consisting of coverage and evaluation figures, are presented. They show a remarkably higher accuracy when ANNs, in general, are used as opposed to simple lexical methods. RNNs, in particular, also show higher coverage and accuracy than FNNs.