Recognizing Words from Source Code Identifiers Using Speech Recognition Techniques

Authors:
Nioosha Madani;Latifa Guerrouj;Massimiliano Di Penta;Yann-Gael Gueheneuc;Giuliano Antoniol
Affiliations:
-;-;-;-;-
Venue:
CSMR '10 Proceedings of the 2010 14th European Conference on Software Maintenance and Reengineering
Year:
2010

Citing 0
Cited 8

An exploratory study of identifier renamings

Proceedings of the 8th Working Conference on Mining Software Repositories
Improving identifier informativeness using part of speech information

Proceedings of the 8th Working Conference on Mining Software Repositories
Improving the tokenisation of identifier names

Proceedings of the 25th European conference on Object-oriented programming
Source code identifier splitting using Yahoo image and web search engine

Proceedings of the First International Workshop on Software Mining
Identification of generalization refactoring opportunities

Automated Software Engineering
A dataset for evaluating identifier splitters

Proceedings of the 10th Working Conference on Mining Software Repositories
Enhancing software artefact traceability recovery processes with link count information

Information and Software Technology
Studying software evolution using topic models

Science of Computer Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

The existing software engineering literature has empirically shown that a proper choice of identifiers influences software understandability and maintainability. Researchers have noticed that identifiers are one of the most important source of information about program entities and that the semantic of identifiers guide the cognitive process. Recognizing the words forming identifiers is not an easy task when naming conventions (e.g., Camel Case) are not used or strictly followed and–or when these words have been abbreviated or otherwise transformed. This paper proposes a technique inspired from speech recognition, i.e., dynamic time warping, to split identifiers into component words. The proposed technique has been applied to identifiers extracted from two different applications: JHotDraw and Lynx. Results compared to manually-built oracles and with Camel Case algorithm are encouraging. In fact, they show that the technique successfully recognizes words composing identifiers (even when abbreviated) in about 90% of cases and that it performs better than Camel Case. Furthermore, it was able to spot mistakes in the manually-built oracle.