Normalizing source code vocabulary to support program comprehension and software quality

Authors:
Latifa Guerrouj
Affiliations:
Polytechnique Montréal, Canada
Venue:
Proceedings of the 2013 International Conference on Software Engineering
Year:
2013

Citing 22
Cited 0

Cognitive strategies and looping constructs: an empirical study

Communications of the ACM
Program Comprehension During Software Maintenance and Evolution

Computer
Recovering Traceability Links between Code and Documentation

IEEE Transactions on Software Engineering
Recovering documentation-to-source-code traceability links using latent semantic indexing

Proceedings of the 25th International Conference on Software Engineering
Assessing the relevance of identifier names in a legacy software system

CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Nomen Est Omen: Analyzing the Language of Function Identifiers

WCRE '99 Proceedings of the Sixth Working Conference on Reverse Engineering
Restructuring Program Identifier Names

ICSM '00 Proceedings of the International Conference on Software Maintenance (ICSM'00)
A cognitive framework for describing and evaluating software exploration tools

A cognitive framework for describing and evaluating software exploration tools
Feed-forward and recurrent neural networks for source code informal information analysis

Journal of Software Maintenance: Research and Practice
How Effective Developers Investigate Source Code: An Exploratory Study

IEEE Transactions on Software Engineering
Concise and Consistent Naming

IWPC '05 Proceedings of the 13th International Workshop on Program Comprehension
3rd international workshop on traceability in emerging forms of software engineering (TEFSE 2005)

Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering
What's in a Name? A Study of Identifiers

ICPC '06 Proceedings of the 14th IEEE International Conference on Program Comprehension
The Conceptual Coupling Metrics for Object-Oriented Systems

ICSM '06 Proceedings of the 22nd IEEE International Conference on Software Maintenance
Using task context to improve programmer productivity

Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering
Using the Conceptual Cohesion of Classes for Fault Prediction in Object-Oriented Systems

IEEE Transactions on Software Engineering
Asking and Answering Questions during a Programming Change Task

IEEE Transactions on Software Engineering
Mining source code to automatically split identifiers for software analysis

MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Normalizing Source Code Vocabulary

WCRE '10 Proceedings of the 2010 17th Working Conference on Reverse Engineering
Can Better Identifier Splitting Techniques Help Feature Location?

ICPC '11 Proceedings of the 2011 IEEE 19th International Conference on Program Comprehension
Expanding identifiers to normalize source code vocabulary

ICSM '11 Proceedings of the 2011 27th IEEE International Conference on Software Maintenance
TRIS: A Fast and Accurate Identifiers Splitting and Expansion Algorithm

WCRE '12 Proceedings of the 2012 19th Working Conference on Reverse Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The literature reports that source code lexicon plays a paramount role in program comprehension, especially when software documentation is scarce, outdated or simply not available. In source code, a significant proportion of vocabulary can be either acronyms and-or abbreviations or concatenation of terms that can not be identified using consistent mechanisms such as naming conventions. It is, therefore, essential to disambiguate concepts conveyed by identifiers to support program comprehension and reap the full benefit of Information Retrieval-based techniques (e.g., feature location and traceability) whose linguistic information (i.e., source code identifiers and comments) used across all software artifacts (e.g., requirements, design, change requests, tests, and source code) must be consistent. To this aim, we propose source code vocabulary normalization approaches that exploit contextual information to align the vocabulary found in the source code with that found in other software artifacts. We were inspired in the choice of context levels by prior works and by our findings. Normalization consists of two tasks: splitting and expansion of source code identifiers. We also investigate the effect of source code vocabulary normalization approaches on software maintenance tasks. Results of our evaluation show that our contextual-aware techniques are accurate and efficient in terms of computation time than state of the art alternatives. In addition, our findings reveal that feature location techniques can benefit from vocabulary normalization approaches when no dynamic information is available.