Semantic clustering: Identifying topics in source code
Information and Software Technology
New Frontiers of Reverse Engineering
FOSE '07 2007 Future of Software Engineering
IEEE Transactions on Software Engineering
Hierarchical Clustering for Software Architecture Recovery
IEEE Transactions on Software Engineering
A Visual Framework for the Definition and Execution of Reverse Engineering Processes
VISUAL '08 Proceedings of the 10th international conference on Visual Information Systems: Web-Based Visual Information Search and Management
Enriching Reverse Engineering with Annotations
MoDELS '08 Proceedings of the 11th international conference on Model Driven Engineering Languages and Systems
Reverse-engineering of an industrial software using the unified process: an experiment
SEA '07 Proceedings of the 11th IASTED International Conference on Software Engineering and Applications
Recommending change clusters to support software investigation: an empirical study
Journal of Software Maintenance and Evolution: Research and Practice - Working Conference on Reverse Engineering (WCRE 2008)
Bug localization using latent Dirichlet allocation
Information and Software Technology
WSEAS Transactions on Computers
Visual comparison of software architectures
Proceedings of the 5th international symposium on Software visualization
Augmented bug localization using past bug information
Proceedings of the 48th Annual Southeast Regional Conference
Information Systems Frontiers
On the congruence of modularity and code coupling
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Explicit use-case representation in object-oriented programming languages
Proceedings of the 7th symposium on Dynamic languages
Clustering methodologies for software engineering
Advances in Software Engineering
Hi-index | 0.00 |
Understanding a software system by just analyzing the structure of the system reveals only half of the picture, since the structure tells us only how the code is working but not what the code is about. What the code is about can be found in the semantics of the source code: names of identifiers, comments etc. In this paper, we analyze how these terms are spread over the source artifacts using Latent Semantic Indexing, an information retrieval technique. We use the assumption that parts of the system that use similar terms are related. We cluster artifacts that use similar terms, and we reveal the most relevant terms for the computed clusters. Our approach works at the level of the source code which makes it language independent. Nevertheless, we correlated the semantics with structural information and we applied it at different levels of abstraction (e.g. classes, methods). We applied our approach on three large case studies and we report the results we obtained.