Software reuse through information retrieval
ACM SIGIR Forum
The visual display of quantitative information
The visual display of quantitative information
An Information Retrieval Approach for Automatically Constructing Software Libraries
IEEE Transactions on Software Engineering
Automating the assignment of submitted manuscripts to reviewers
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
The concept assignment problem in program understanding
ICSE '93 Proceedings of the 15th international conference on Software Engineering
Extracting concepts from file names: a new file clustering criterion
Proceedings of the 20th international conference on Software engineering
ACM Computing Surveys (CSUR)
A computer generated aid for cluster analysis
Communications of the ACM
Modern Information Retrieval
Object Oriented Reengineering Patterns
Object Oriented Reengineering Patterns
Recovering Traceability Links between Code and Documentation
IEEE Transactions on Software Engineering
Latent Semantic Analysis for German Literature Investigation
Proceedings of the International Conference, 7th Fuzzy Days on Computational Intelligence, Theory and Applications
Recovering documentation-to-source-code traceability links using latent semantic indexing
Proceedings of the 25th International Conference on Software Engineering
Hipikat: recommending pertinent software development artifacts
Proceedings of the 25th International Conference on Software Engineering
Nomen Est Omen: Analyzing the Language of Function Identifiers
WCRE '99 Proceedings of the Sixth Working Conference on Reverse Engineering
Information Retrieval Models for Recovering Traceability Links between Code and Documentation
ICSM '00 Proceedings of the International Conference on Software Maintenance (ICSM'00)
Identification of High-Level Concept Clones in Source Code
Proceedings of the 16th IEEE international conference on Automated software engineering
An Approach to Classify Software Maintenance Requests
ICSM '02 Proceedings of the International Conference on Software Maintenance (ICSM'02)
ICTAI '00 Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence
Enhancing an Artefact Management System with Traceability Recovery Features
ICSM '04 Proceedings of the 20th IEEE International Conference on Software Maintenance
MUDABlue: An Automatic Categorization System for Open Source Repositories
APSEC '04 Proceedings of the 11th Asia-Pacific Software Engineering Conference
An Information Retrieval Approach to Concept Location in Source Code
WCRE '04 Proceedings of the 11th Working Conference on Reverse Engineering
Hypertext support for the information needs of software maintainers
Journal of Software Maintenance and Evolution: Research and Practice
The Class Blueprint: Visually Supporting the Understanding of Classes
IEEE Transactions on Software Engineering
The story of moose: an agile reengineering environment
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
The Conceptual Cohesion of Classes
ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
Enriching Reverse Engineering with Semantic Clustering
WCRE '05 Proceedings of the 12th Working Conference on Reverse Engineering
Advancing Candidate Link Generation for Requirements Tracing: The Study of Methods
IEEE Transactions on Software Engineering
Can LSI help Reconstructing Requirements Traceability in Design and Test?
CSMR '06 Proceedings of the Conference on Software Maintenance and Reengineering
Package Patterns for Visual Architecture Recovery
CSMR '06 Proceedings of the Conference on Software Maintenance and Reengineering
ICSM '06 Proceedings of the 22nd IEEE International Conference on Software Maintenance
Feature location via information retrieval based filtering of a single scenario execution trace
Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Mining concepts from code with probabilistic topic models
Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Mining business topics in source code using latent dirichlet allocation
ISEC '08 Proceedings of the 1st India software engineering conference
Identifying domain expertise of developers from source code
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
The impacts of function extraction technology on program comprehension: A controlled experiment
Information and Software Technology
A theory of aspects as latent topics
Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Using information retrieval based coupling measures for impact analysis
Empirical Software Engineering
An information retrieval process to aid in the analysis of code clones
Empirical Software Engineering
Sourcerer: mining and searching internet-scale software repositories
Data Mining and Knowledge Discovery
Journal of Software Maintenance and Evolution: Research and Practice - Special Issue on the 12th Conference on Software Maintenance and Reengineering (CSMR 2008)
Story Visualization of Literary Works: How a Computer Reads Shakespeare's Plays
Journal of Visualization
Automatically capturing source code context of NL-queries for software maintenance and reuse
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Combining textual and structural analysis of software artifacts for traceability link recovery
TEFSE '09 Proceedings of the 2009 ICSE Workshop on Traceability in Emerging Forms of Software Engineering
An approach for architectural layer recovery
Proceedings of the 2010 ACM Symposium on Applied Computing
Software Cartography: thematic software visualization with consistent layout
Journal of Software Maintenance and Evolution: Research and Practice - Working Conference on Reverse Engineering (WCRE 2008)
Linking e-mails and source code artifacts
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Recommending source code examples via API call usages and documentation
Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering
Supporting program comprehension with source code summarization
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2
UsabML: formalising the exchange of usability findings
Proceedings of the 2nd ACM SIGCHI symposium on Engineering interactive computing systems
Package Fingerprints: A visual summary of package interface usage
Information and Software Technology
Embedding spatial software visualization in the IDE: an exploratory study
Proceedings of the 5th international symposium on Software visualization
Journal of Systems and Software
Augmented bug localization using past bug information
Proceedings of the 48th Annual Southeast Regional Conference
Achievements and challenges in software reverse engineering
Communications of the ACM
Identifying Extract Class refactoring opportunities using structural and semantic cohesion measures
Journal of Systems and Software
Evaluating reuse and program understanding in ArchMine architecture recovery approach
Information Sciences: an International Journal
Proceedings of the 8th Working Conference on Mining Software Repositories
Security versus performance bugs: a case study on Firefox
Proceedings of the 8th Working Conference on Mining Software Repositories
Modeling the evolution of topics in source code histories
Proceedings of the 8th Working Conference on Mining Software Repositories
Automatically detecting and describing high level actions within methods
Proceedings of the 33rd International Conference on Software Engineering
Source code indexing for automated tracing
Proceedings of the 6th International Workshop on Traceability in Emerging Forms of Software Engineering
Have your spaghetti and eat it too: evolutionary algorithmics and post-evolutionary analysis
Genetic Programming and Evolvable Machines
Applying a dynamic threshold to improve cluster detection of LSI
Science of Computer Programming
An investigation of clustering algorithms in the identification of similar web pages
Journal of Web Engineering
Improving the tokenisation of identifier names
Proceedings of the 25th European conference on Object-oriented programming
A practice-driven systematic review of dependency analysis solutions
Empirical Software Engineering
Using intelligent tutors to enhance student learning of application programming interfaces
Journal of Computing Sciences in Colleges
Approximate graph clustering for program characterization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Agile software assessment with Moose
ACM SIGSOFT Software Engineering Notes
Enhancing architectural recovery using concerns
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Analyzing and mining a code search engine usage log
Empirical Software Engineering
Content classification of development emails
Proceedings of the 34th International Conference on Software Engineering
Using structural and semantic information to support software refactoring
Proceedings of the 34th International Conference on Software Engineering
Improving information retrieval-based concept location using contextual relationships
Proceedings of the 34th International Conference on Software Engineering
Domain model-driven software engineering: A method for discovery of dependency links
Information and Software Technology
Concept location using formal concept analysis and information retrieval
ACM Transactions on Software Engineering and Methodology (TOSEM)
Software systems through complex networks science: review, analysis and applications
Proceedings of the First International Workshop on Software Mining
Mining textual requirements to assist architectural software design: a state of the art review
Artificial Intelligence Review
Risk chain prediction metrics for predicting fault proneness in object oriented systems
Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
Concept-based failure clustering
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
What is middleware made of?: exploring abstractions, concepts, and class names in modern middleware
Proceedings of the 11th International Workshop on Adaptive and Reflective Middleware
Recovering traceability links between feature models and source code of product variants
Proceedings of the VARiability for You Workshop: Variability Modeling Made Useful for Everyone
Improving feature location practice with multi-faceted interactive exploration
Proceedings of the 2013 International Conference on Software Engineering
Mining source code repositories at massive scale using language modeling
Proceedings of the 10th Working Conference on Mining Software Repositories
Automatically describing software faults
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Probability-based text clustering algorithm by alternately repeating two operations
Journal of Information Science
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law
Quantitative cross impact analysis with latent semantic indexing
Expert Systems with Applications: An International Journal
Improving software modularization via automated analysis of latent topics and dependencies
ACM Transactions on Software Engineering and Methodology (TOSEM)
Studying software evolution using topic models
Science of Computer Programming
Semantic compared cross impact analysis
Expert Systems with Applications: An International Journal
Static test case prioritization using topic models
Empirical Software Engineering
How changes affect software entropy: an empirical study
Empirical Software Engineering
Story visualization of novels with multi-theme keyword density analysis
Journal of Visualization
Hi-index | 0.02 |
Many of the existing approaches in Software Comprehension focus on program structure or external documentation. However, by analyzing formal information the informal semantics contained in the vocabulary of source code are overlooked. To understand software as a whole, we need to enrich software analysis with the developer knowledge hidden in the code naming. This paper proposes the use of information retrieval to exploit linguistic information found in source code, such as identifier names and comments. We introduce Semantic Clustering, a technique based on Latent Semantic Indexing and clustering to group source artifacts that use similar vocabulary. We call these groups semantic clusters and we interpret them as linguistic topics that reveal the intention of the code. We compare the topics to each other, identify links between them, provide automatically retrieved labels, and use a visualization to illustrate how they are distributed over the system. Our approach is language independent as it works at the level of identifier names. To validate our approach we applied it on several case studies, two of which we present in this paper. Note: Some of the visualizations presented make heavy use of colors. Please obtain a color copy of the article for better understanding.