Quantifying identifier quality: an analysis of trends

  • Authors:
  • Dawn Lawrie;Henry Feild;David Binkley

  • Affiliations:
  • Loyola College in Maryland, Baltimore, USA 21210;Loyola College in Maryland, Baltimore, USA 21210;Loyola College in Maryland, Baltimore, USA 21210

  • Venue:
  • Empirical Software Engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identifiers, which represent the defined concepts in a program, account for, by some measures, almost three quarters of source code. The makeup of identifiers plays a key role in how well they communicate these defined concepts. An empirical study of identifier quality based on almost 50 million lines of code, covering thirty years, four programming languages, and both open and proprietary source is presented. For the purposes of the study, identifier quality is conservatively defined as the possibility of constructing the identifier out of dictionary words or known abbreviations. Four hypotheses related to identifier quality are considered using linear mixed effect regression models. For example, the first hypothesis is that modern programs include higher quality identifiers than older ones. In this case, the results show that better programming practices are producing higher quality identifies. Results also confirm some commonly held beliefs, such as proprietary code having more acronyms than open source code.