Identifier length and limited programmer memory

Authors:
Dave Binkley;Dawn Lawrie;Steve Maex;Christopher Morrell
Affiliations:
Loyola College, Baltimore MD, 21210-2699, USA;Loyola College, Baltimore MD, 21210-2699, USA;Loyola College, Baltimore MD, 21210-2699, USA;Loyola College, Baltimore MD, 21210-2699, USA
Venue:
Science of Computer Programming
Year:
2009

Citing 7
Cited 2

Estimating understandability of software documents

ACM SIGSOFT Software Engineering Notes
Recovering Traceability Links between Code and Documentation

IEEE Transactions on Software Engineering
Restructuring Program Identifier Names

ICSM '00 Proceedings of the International Conference on Software Maintenance (ICSM'00)
Identifying Comprehension Bottlenecks Using Program Slicing and Cognitive Complexity Metrics

IWPC '03 Proceedings of the 11th IEEE International Workshop on Program Comprehension
Concise and Consistent Naming

IWPC '05 Proceedings of the 13th International Workshop on Program Comprehension
A Survey of Controlled Experiments in Software Engineering

IEEE Transactions on Software Engineering
What's in a Name? A Study of Identifiers

ICPC '06 Proceedings of the 14th IEEE International Conference on Program Comprehension

The impact of identifier style on effort and comprehension

Empirical Software Engineering
A dataset for evaluating identifier splitters

Proceedings of the 10th Working Conference on Mining Software Repositories

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because early variable mnemonics were limited to as few as six to eight characters, many early programmers abbreviated concepts in their variable names. The past thirty years have seen a steady increase in permitted name length and, slowly, an increase in the actual identifier length. However, in theory names can be too long for programmers to comprehend and manipulate effectively. Most obviously, in object-oriented programs, entity naming often involves chaining of method calls and field selectors (e.g., class.firstAssignment().name.trim()). While longer names bring the potential for better comprehension through more embedded sub-words, there are practical limits to their length given limited human memory resources. The driving hypothesis behind the presented study is that names used in modern programs have reached this limit. Thus, a goal of the study is to better understand the balance between longer, more expressive names and limited programmer memory resources. Statistical models derived from an experiment involving 158 programmers of varying degrees of experience show that longer names extracted from production code take more time to process and reduce correctness in a simple recall activity. This has clear negative implications for any attempt to read, and hence comprehend or manipulate, the source code found in modern software. The experiment also evaluates the advantage of identifiers having probable ties to a programmer's persistent memory. Combined, these results reinforce past proposals advocating the use of limited, consistent, and regular vocabulary in identifier names. In particular, good naming limits individual name length and reduces the need for specialized vocabulary.