Assessing the relevance of identifier names in a legacy software system
CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Nomen Est Omen: Analyzing the Language of Function Identifiers
WCRE '99 Proceedings of the Sixth Working Conference on Reverse Engineering
Software Quality Control
Quantifying identifier quality: an analysis of trends
Empirical Software Engineering
Identifier length and limited programmer memory
Science of Computer Programming
Mining source code to automatically split identifiers for software analysis
MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Normalizing Source Code Vocabulary
WCRE '10 Proceedings of the 2010 17th Working Conference on Reverse Engineering
Recognizing Words from Source Code Identifiers Using Speech Recognition Techniques
CSMR '10 Proceedings of the 2010 14th European Conference on Software Maintenance and Reengineering
Improving the tokenisation of identifier names
Proceedings of the 25th European conference on Object-oriented programming
LINSEN: An efficient approach to split identifiers and expand abbreviations
ICSM '12 Proceedings of the 2012 IEEE International Conference on Software Maintenance (ICSM)
Hi-index | 0.00 |
Software engineering and evolution techniques have recently started to exploit the natural language information in source code. A key step in doing so is splitting identifiers into their constituent words. While simple in concept, identifier splitting raises several challenging issues, leading to a range of splitting techniques. Consequently, the research community would benefit from a dataset (i.e., a gold set) that facilitates comparative studies of identifier splitting techniques. A gold set of 2,663 split identifiers was constructed from 8,522 individual human splitting judgements and can be obtained from www.cs.loyola.edu/~binkley/ludiso. This set's construction and observations aimed at its effective use are described.