The maintenance problem of application software: an empirical analysis
Journal of Software Maintenance: Research and Practice
Reverse engineering: a roadmap
Proceedings of the Conference on The Future of Software Engineering
Software Engineering Economics
Software Engineering Economics
Recovering Traceability Links between Code and Documentation
IEEE Transactions on Software Engineering
File clustering using naming conventions for legacy systems
CASCON '97 Proceedings of the 1997 conference of the Centre for Advanced Studies on Collaborative research
Assessing the relevance of identifier names in a legacy software system
CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Restructuring Program Identifier Names
ICSM '00 Proceedings of the International Conference on Software Maintenance (ICSM'00)
SNIAFL: Towards a static noninteractive approach to feature location
ACM Transactions on Software Engineering and Methodology (TOSEM)
Leveraged Quality Assessment using Information Retrieval Techniques
ICPC '06 Proceedings of the 14th IEEE International Conference on Program Comprehension
IEEE Transactions on Software Engineering
Identifying Changed Source Code Lines from Version Repositories
MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Feature location via information retrieval based filtering of a single scenario execution trace
Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Do Code and Comments Co-Evolve? On the Relation between Source Code and Comment Changes
WCRE '07 Proceedings of the 14th Working Conference on Reverse Engineering
An approach to detecting duplicate bug reports using natural language and execution information
Proceedings of the 30th international conference on Software engineering
A Traceability Technique for Specifications
ICPC '08 Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension
On the Use of Domain Terms in Source Code
ICPC '08 Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension
Analyzing the Evolution of the Source Code Vocabulary
CSMR '09 Proceedings of the 2009 European Conference on Software Maintenance and Reengineering
Lexicon Bad Smells in Software
WCRE '09 Proceedings of the 2009 16th Working Conference on Reverse Engineering
Bug localization using latent Dirichlet allocation
Information and Software Technology
Hi-index | 0.00 |
Several recent static analysis techniques automate software understanding activities by extracting textual information from source code and applying information retrieval models to the extracted corpora. These source code retrieval techniques show efficacy, but the literature provides no guidance regarding configuration of their constituent processes. For example, the literature provides conflicting information regarding the benefit of extracting comments and string literals along with identifiers such as method or variable names. In this paper we present an initial investigation into the similarities between three source code lexicons described in the literature: identifiers, comments, and string literals. We address three research questions using a case study of six open source Java projects. The results indicate that methods uniquely contain from 30% to 60% of the projects' terms, whereas the comments uniquely contain from 22% to 45% of the terms. Future work includes analyzing the extent to which comments and string literals introduce domain terms rather than non-domain terms.