An empirical study of rules for well-formed identifiers: Research Articles

  • Authors:
  • Dawn Lawrie;Henry Feild;David Binkley

  • Affiliations:
  • Computer Science Department, Loyola College, 4501 N. Charles St., Baltimore, MD 21210-2699, U.S.A.;Computer Science Department, Loyola College, 4501 N. Charles St., Baltimore, MD 21210-2699, U.S.A.;Computer Science Department, Loyola College, 4501 N. Charles St., Baltimore, MD 21210-2699, U.S.A.

  • Venue:
  • Journal of Software Maintenance and Evolution: Research and Practice - Source Code Analysis and Manipulation (SCAM 2006)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Readers of programs have two main sources of domain information: identifier names and comments. In order to efficiently maintain source code, it is important that the identifier names (as well as comments) communicate clearly the concepts they represent. Deißenböck and Pizka recently introduced two rules for creating well-formed identifiers: one considers the consistency of identifiers and the other their conciseness. These rules require a mapping from identifiers to the concepts they represent, which may be costly to develop after the initial release of a system. An approach for verifying whether identifiers are well formed without any additional information (e.g., a concept mapping) is developed. Using a pool of 48 million lines of code, experiments with the resulting syntactic rules for well-formed identifiers illustrate that violations of the syntactic pattern exist. Two case studies show that three-quarters of these violations are ‘real’. That is, they could be identified using a concept mapping. Three related studies show that programmers tend to use a rather limited vocabulary, that, contrary to many other aspects of system evolution, maintenance does not introduce additional rule violations, and that open and proprietary sources differ in their percentage of violations. Copyright © 2007 John Wiley & Sons, Ltd.