Restructuring Program Identifier Names
ICSM '00 Proceedings of the International Conference on Software Maintenance (ICSM'00)
An XML-Based Lightweight C++ Fact Extractor
IWPC '03 Proceedings of the 11th IEEE International Workshop on Program Comprehension
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Using natural language program analysis to locate and understand action-oriented concerns
Proceedings of the 6th international conference on Aspect-oriented software development
Mining source code to automatically split identifiers for software analysis
MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Genoa Proceedings of the 23rd European Conference on ECOOP 2009 --- Object-Oriented Programming
Natural Language Parsing of Program Element Names for Concept Extraction
ICPC '10 Proceedings of the 2010 IEEE 18th International Conference on Program Comprehension
Normalizing Source Code Vocabulary
WCRE '10 Proceedings of the 2010 17th Working Conference on Reverse Engineering
Recognizing Words from Source Code Identifiers Using Speech Recognition Techniques
CSMR '10 Proceedings of the 2010 14th European Conference on Software Maintenance and Reengineering
On the naturalness of software
Proceedings of the 34th International Conference on Software Engineering
Supporting concept location through identifier parsing and ontology extraction
Journal of Systems and Software
Hi-index | 0.00 |
Recent software development tools have exploited the mining of natural language information found within software and its supporting documentation. To make the most of this information, researchers have drawn upon the work of the natural language processing community for tools and techniques. One such tool provides part-of-speech information, which finds application in improving the searching of software repositories and extracting domain information found in identifiers. Unfortunately, the natural language found is software differs from that found in standard prose. This difference potentially limits the effectiveness of off-the-shelf tools. An empirical investigation finds that with minimal guidance an existing tagger was correct 88% of the time when tagging the words found in source code identifiers. The investigation then uses the improved part-of-speech information to tag a large corpus of over 145,000 structure-field names. From patterns in the tags several rules emerge that seek to understand past usage and to improve future naming.