Automatically mining software-based, semantically-similar words from comment-code mappings

Authors:
Matthew J. Howard;Samir Gupta;Lori Pollock;K. Vijay-Shanker
Affiliations:
University of Delaware, USA;University of Delaware, USA;University of Delaware, USA;University of Delaware, USA
Venue:
Proceedings of the 10th Working Conference on Mining Software Repositories
Year:
2013

Citing 23
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Identifying Similar Code with Program Dependence Graphs

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
A Differencing Algorithm for Object-Oriented Programs

Proceedings of the 19th IEEE international conference on Automated software engineering
An Information Retrieval Approach to Concept Location in Source Code

WCRE '04 Proceedings of the 11th Working Conference on Reverse Engineering
Using structural context to recommend source code examples

Proceedings of the 27th international conference on Software engineering
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Automatic generation of suggestions for program investigation

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Who should fix this bug?

Proceedings of the 28th international conference on Software engineering
Leveraged Quality Assessment using Information Retrieval Techniques

ICPC '06 Proceedings of the 14th IEEE International Conference on Program Comprehension
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Sourcerer: a search engine for open source code supporting structure-based search

Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Using natural language program analysis to locate and understand action-oriented concerns

Proceedings of the 6th international conference on Aspect-oriented software development
Recommending random walks

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
/*icomment: bugs or bad comments?*/

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Exploring the neighborhood with dora to expedite software maintenance

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Do Code and Comments Co-Evolve? On the Relation between Source Code and Comment Changes

WCRE '07 Proceedings of the 14th Working Conference on Reverse Engineering
TODO or to bug: exploring how task annotations play a role in the work practices of software developers

Proceedings of the 30th international conference on Software engineering
Introduction to Information Retrieval

Introduction to Information Retrieval
Identifying Word Relations in Software: A Comparative Study of Semantic Similarity Tools

ICPC '08 Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension
Mining source code to automatically split identifiers for software analysis

MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Natural Language Parsing of Program Element Names for Concept Extraction

ICPC '10 Proceedings of the 2010 IEEE 18th International Conference on Program Comprehension
aComment: mining annotations from comments and code to detect interrupt related concurrency bugs

Proceedings of the 33rd International Conference on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many software development and maintenance tools involve matching between natural language words in different software artifacts (e.g., traceability) or between queries submitted by a user and software artifacts (e.g., code search). Because different people likely created the queries and various artifacts, the effectiveness of these tools is often improved by expanding queries and adding related words to textual artifact representations. Synonyms are particularly useful to overcome the mismatch in vocabularies, as well as other word relations that indicate semantic similarity. However, experience shows that many words are semantically similar in computer science situations, but not in typical natural language documents. In this paper, we present an automatic technique to mine semantically similar words, particularly in the software context. We leverage the role of leading comments for methods and programmer conventions in writing them. Our evaluation of our mined related comment-code word mappings that do not already occur in WordNet are indeed viewed as computer science, semantically-similar word pairs in high proportions.