Quantifying identifier quality: an analysis of trends

Authors:
Dawn Lawrie;Henry Feild;David Binkley
Affiliations:
Loyola College in Maryland, Baltimore, USA 21210;Loyola College in Maryland, Baltimore, USA 21210;Loyola College in Maryland, Baltimore, USA 21210
Venue:
Empirical Software Engineering
Year:
2007

Citing 19
Cited 7

An Information Retrieval Approach for Automatically Constructing Software Libraries

IEEE Transactions on Software Engineering
Elements of information theory

Elements of information theory
A Review of Statistical Language Processing Techniques

Artificial Intelligence Review
Extracting concepts from file names: a new file clustering criterion

Proceedings of the 20th international conference on Software engineering
Selected Papers on Computer Languages

Selected Papers on Computer Languages
Recovering Traceability Links between Code and Documentation

IEEE Transactions on Software Engineering
Assessing the relevance of identifier names in a legacy software system

CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Object-oriented COBOL recycling

WCRE '96 Proceedings of the 3rd Working Conference on Reverse Engineering (WCRE '96)
Nomen Est Omen: Analyzing the Language of Function Identifiers

WCRE '99 Proceedings of the Sixth Working Conference on Reverse Engineering
Generating Robust Parsers using Island Grammars

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Restructuring Program Identifier Names

ICSM '00 Proceedings of the International Conference on Software Maintenance (ICSM'00)
Identifying Comprehension Bottlenecks Using Program Slicing and Cognitive Complexity Metrics

IWPC '03 Proceedings of the 11th IEEE International Workshop on Program Comprehension
Identification of High-Level Concept Clones in Source Code

Proceedings of the 16th IEEE international conference on Automated software engineering
Automatic Categorization Algorithm for Evolvable Software Archive

IWPSE '03 Proceedings of the 6th International Workshop on Principles of Software Evolution
Language models for hierarchical summarization

Language models for hierarchical summarization
An Information Retrieval Approach to Concept Location in Source Code

WCRE '04 Proceedings of the 11th Working Conference on Reverse Engineering
Concise and Consistent Naming

IWPC '05 Proceedings of the 13th International Workshop on Program Comprehension
What's in a Name? A Study of Identifiers

ICPC '06 Proceedings of the 14th IEEE International Conference on Program Comprehension
Programs are Knowledge Bases

ICPC '06 Proceedings of the 14th IEEE International Conference on Program Comprehension

The effect of identifier naming on source code readability and quality

Proceedings of the doctoral symposium for ESEC/FSE on Doctoral symposium
Debugging Method Names

Genoa Proceedings of the 23rd European Conference on ECOOP 2009 --- Object-Oriented Programming
Recommending rename refactorings

Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering
CodeTopics: which topic am I coding now?

Proceedings of the 33rd International Conference on Software Engineering
Improving the tokenisation of identifier names

Proceedings of the 25th European conference on Object-oriented programming
Productivity reanalysis for unbalanced datasets with mixed-effects models

PROFES'10 Proceedings of the 11th international conference on Product-Focused Software Process Improvement
A dataset for evaluating identifier splitters

Proceedings of the 10th Working Conference on Mining Software Repositories

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifiers, which represent the defined concepts in a program, account for, by some measures, almost three quarters of source code. The makeup of identifiers plays a key role in how well they communicate these defined concepts. An empirical study of identifier quality based on almost 50 million lines of code, covering thirty years, four programming languages, and both open and proprietary source is presented. For the purposes of the study, identifier quality is conservatively defined as the possibility of constructing the identifier out of dictionary words or known abbreviations. Four hypotheses related to identifier quality are considered using linear mixed effect regression models. For example, the first hypothesis is that modern programs include higher quality identifiers than older ones. In this case, the results show that better programming practices are producing higher quality identifies. Results also confirm some commonly held beliefs, such as proprietary code having more acronyms than open source code.