Analyzing the Evolution of the Source Code Vocabulary

Authors:
Surafel Lemma Abebe;Sonia Haiduc;Andrian Marcus;Paolo Tonella;Giuliano Antoniol
Affiliations:
-;-;-;-;-
Venue:
CSMR '09 Proceedings of the 2009 European Conference on Software Maintenance and Reengineering
Year:
2009

Citing 0
Cited 6

Automatic quality assessment of source code comments: the JavadocMiner

NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
An exploratory study of identifier renamings

Proceedings of the 8th Working Conference on Mining Software Repositories
Quantifying the similiarities between source code lexicons

Proceedings of the 49th Annual Southeast Regional Conference
Toward an understanding of the relationship between the identifier and comment lexicons

Proceedings of the 49th Annual Southeast Regional Conference
Supporting concept location through identifier parsing and ontology extraction

Journal of Systems and Software
Assessing the quality factors found in in-line documentation written in natural language: The JavadocMiner

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Source code is a mixed software artifact, containing information for both the compiler and the developers. While programming language grammar dictates how the source code is written, developers have a lot of freedom in writing identifiers and comments. These are intentional in nature and become means of communication between developers.The goal of this paper is to analyze how the source code vocabulary changes during evolution, through an exploratory study of two software systems. Specifically, we collected data to answer a set of questions about the vocabulary evolution, such as: How does the size of the source code vocabulary evolve over time? What do most frequent terms refer to? Are new identifiers introducing new terms? Are there terms shared between different types of identifiers and comments? Are new and deleted terms in a type of identifiers mirrored in other types of identifiers or in comments?