Supporting concept location through identifier parsing and ontology extraction

Authors:
Surafel Lemma Abebe;Anita Alicante;Anna Corazza;Paolo Tonella
Affiliations:
-;-;-;-
Venue:
Journal of Systems and Software
Year:
2013

Citing 31
Cited 0

WordNet: a lexical database for English

Communications of the ACM
Enhancing maintainability of source programs through disabbreviation

Journal of Systems and Software
The concept assignment problem in program understanding

ICSE '93 Proceedings of the 15th international conference on Software Engineering
Making large-scale support vector machine learning practical

Advances in kernel methods
Supporting document and data views of source code

Proceedings of the 2002 ACM symposium on Document engineering
The Role of Concepts in Program Comprehension

IWPC '02 Proceedings of the 10th International Workshop on Program Comprehension
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
An Information Retrieval Approach to Concept Location in Source Code

WCRE '04 Proceedings of the 11th Working Conference on Reverse Engineering
Ontologies and semantics for seamless connectivity

ACM SIGMOD Record
LaTaT: language and text analysis tools

HLT '01 Proceedings of the first international conference on Human language technology research
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Source Code Exploration with Google

ICSM '06 Proceedings of the 22nd IEEE International Conference on Software Maintenance
Using natural language program analysis to locate and understand action-oriented concerns

Proceedings of the 6th international conference on Aspect-oriented software development
Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code

ICPC '07 Proceedings of the 15th IEEE International Conference on Program Comprehension
Extracting Meaning from Abbreviated Identifiers

SCAM '07 Proceedings of the Seventh IEEE International Working Conference on Source Code Analysis and Manipulation
AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools

Proceedings of the 2008 international working conference on Mining software repositories
Partial Domain Comprehension in Software Evolution and Maintenance

ICPC '08 Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension
Automated Concept Location Using Independent Component Analysis

WCRE '08 Proceedings of the 2008 15th Working Conference on Reverse Engineering
An empirical analysis of information retrieval based concept location techniques in software comprehension

Empirical Software Engineering
Extracting Domain Ontologies from Domain Specific APIs

CSMR '08 Proceedings of the 2008 12th European Conference on Software Maintenance and Reengineering
Analyzing the Evolution of the Source Code Vocabulary

CSMR '09 Proceedings of the 2009 European Conference on Software Maintenance and Reengineering
Automatically capturing source code context of NL-queries for software maintenance and reuse

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Natural Language Parsing of Program Element Names for Concept Extraction

ICPC '10 Proceedings of the 2010 IEEE 18th International Conference on Program Comprehension
Normalizing Source Code Vocabulary

WCRE '10 Proceedings of the 2010 17th Working Conference on Reverse Engineering
Improving identifier informativeness using part of speech information

Proceedings of the 8th Working Conference on Mining Software Repositories
Towards the Extraction of Domain Concepts from the Identifiers

WCRE '11 Proceedings of the 2011 18th Working Conference on Reverse Engineering
Mining java class naming conventions

ICSM '11 Proceedings of the 2011 27th IEEE International Conference on Software Maintenance
Expanding identifiers to normalize source code vocabulary

ICSM '11 Proceedings of the 2011 27th IEEE International Conference on Software Maintenance
Evaluating the specificity of text retrieval queries to support software engineering tasks

Proceedings of the 34th International Conference on Software Engineering
Automatic query performance assessment during the retrieval of software artifacts

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
LINSEN: An efficient approach to split identifiers and expand abbreviations

ICSM '12 Proceedings of the 2012 IEEE International Conference on Software Maintenance (ICSM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifier names play a key role in program understanding and in particular in concept location. Programmers can easily ''parse'' identifiers and understand the intended meaning. This, however, is not trivial for tools that try to exploit the information in the identifiers to support program understanding. To address this problem, we resort to natural language analyzers, which parse tokenized identifier names and provide the syntactic relationships (dependencies) among the terms composing the identifiers. Such relationships are then mapped to semantic relationships. In this study, we have evaluated the use of off-the-shelf and trained natural language analyzers to parse identifier names, extract an ontology and use it to support concept location. In the evaluation, we assessed whether the concepts taken from the ontology can be used to improve the efficiency of queries used in concept location. We have also investigated if the use of different natural language analyzers has an impact on the ontology extracted and the support it provides to concept location. Results show that using the concepts from the ontology significantly improves the efficiency of concept location queries (e.g., in some cases, an improvement of 127% is observed). The results also indicate that the efficiency of concept location queries is not affected by the differences in the ontologies produced by different analyzers.