Improving identifier informativeness using part of speech information

Authors:
Dave Binkley;Matthew Hearn;Dawn Lawrie
Affiliations:
Loyola University Maryland, Baltimore, USA;Loyola University Maryland, Baltimore, USA;Loyola University Maryland, Baltimore, USA
Venue:
Proceedings of the 8th Working Conference on Mining Software Repositories
Year:
2011

Citing 9
Cited 2

Restructuring Program Identifier Names

ICSM '00 Proceedings of the International Conference on Software Maintenance (ICSM'00)
An XML-Based Lightweight C++ Fact Extractor

IWPC '03 Proceedings of the 11th IEEE International Workshop on Program Comprehension
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Using natural language program analysis to locate and understand action-oriented concerns

Proceedings of the 6th international conference on Aspect-oriented software development
Mining source code to automatically split identifiers for software analysis

MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Debugging Method Names

Genoa Proceedings of the 23rd European Conference on ECOOP 2009 --- Object-Oriented Programming
Natural Language Parsing of Program Element Names for Concept Extraction

ICPC '10 Proceedings of the 2010 IEEE 18th International Conference on Program Comprehension
Normalizing Source Code Vocabulary

WCRE '10 Proceedings of the 2010 17th Working Conference on Reverse Engineering
Recognizing Words from Source Code Identifiers Using Speech Recognition Techniques

CSMR '10 Proceedings of the 2010 14th European Conference on Software Maintenance and Reengineering

On the naturalness of software

Proceedings of the 34th International Conference on Software Engineering
Supporting concept location through identifier parsing and ontology extraction

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent software development tools have exploited the mining of natural language information found within software and its supporting documentation. To make the most of this information, researchers have drawn upon the work of the natural language processing community for tools and techniques. One such tool provides part-of-speech information, which finds application in improving the searching of software repositories and extracting domain information found in identifiers. Unfortunately, the natural language found is software differs from that found in standard prose. This difference potentially limits the effectiveness of off-the-shelf tools. An empirical investigation finds that with minimal guidance an existing tagger was correct 88% of the time when tagging the words found in source code identifiers. The investigation then uses the improved part-of-speech information to tag a large corpus of over 145,000 structure-field names. From patterns in the tags several rules emerge that seek to understand past usage and to improve future naming.