Exploring Java software vocabulary: A search and mining perspective
SUITE '09 Proceedings of the 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation
Automatic domain terminology extraction using graph mutual reinforcement
WAIM'10 Proceedings of the 11th international conference on Web-age information management
An exploratory study of identifier renamings
Proceedings of the 8th Working Conference on Mining Software Repositories
Quantifying the similiarities between source code lexicons
Proceedings of the 49th Annual Southeast Regional Conference
Toward an understanding of the relationship between the identifier and comment lexicons
Proceedings of the 49th Annual Southeast Regional Conference
Reusability of open-source program code: a conceptual model and empirical investigation
ACM SIGSOFT Software Engineering Notes
Studying software evolution using topic models
Science of Computer Programming
Hi-index | 0.00 |
Information about the problem domain of the software and the solution it implements is often embedded by developers in comments and identifiers. When using software developed by others or when are new to a project, programmers know little about how domain information is reflected in the source code. Programmers often learn about the domain from external sources such as books, articles, etc. Hence, it is important to use in comments and identifiers terms that are commonly known in the domain literature, as it is likely that programmers will use such terms when searching the source code. The paper presents a case study that investigated how domain terms are used in comments and identifiers. The study focused on three research questions: (1) to what degree are domain terms found in the source code of software from a particular problem domain?; (2) which is the preponderant source of domain terms: identifiers or comments?; and (3) to what degree are domain terms shared between several systems from the same problem domain? Within the studied software, we found that in average: 42% of the domain terms were used in the source code; 23% of the domain terms used in the source code are present in comments only, whereas only 11% in the identifiers alone, and there is a 63% agreement in the use of domain terms between any two software systems.