Extracting Meaning from Abbreviated Identifiers

  • Authors:
  • Dawn Lawrie;Henry Feild;David Binkley

  • Affiliations:
  • Loyola College, USA;Loyola College, USA;Loyola College, USA

  • Venue:
  • SCAM '07 Proceedings of the Seventh IEEE International Working Conference on Source Code Analysis and Manipulation
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Informative identifiers are made up of full (natural language) words and (meaningful) abbreviations. Readers of programs typically have little trouble understanding the purpose of identifiers composed of full words. In addition, those familiar with the code can (most often) determine the meaning of abbreviations used in identifiers. However, when faced with unfamiliar code, abbreviations often carry little useful information. Furthermore, tools that focus on the natural language used in the code have a hard time in the presence of abbreviations. One approach to providing meaning to programmers and tools is to translate (expand) abbreviations into full words. This paper presents a methodology for expanding identifiers and evaluates the process on a code based of just over 35 million lines of code. For example, using phrase extraction, fs exists is expanded to file status exists illustrating how the expansion process can facilitate comprehension. On average, 16 percent of the identifiers in a program are expanded. Finally, as an example application, the approach is used to improve the syntactic identification of violations to Deiβenb篓ock and Pizka's rules for concise and consistent identifier construction.