Supporting program comprehension using semantic and structural information
ICSE '01 Proceedings of the 23rd International Conference on Software Engineering
Recovering documentation-to-source-code traceability links using latent semantic indexing
Proceedings of the 25th International Conference on Software Engineering
Identification of High-Level Concept Clones in Source Code
Proceedings of the 16th IEEE international conference on Automated software engineering
MUDABlue: an automatic categorization system for open source repositories
Journal of Systems and Software - Special issue: Selected papers from the 11th Asia Pacific software engineering conference (APSEC 2004)
Semantic clustering: Identifying topics in source code
Information and Software Technology
Supervised categorization of JavaScriptTM using program analysis features
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Journal of Systems and Software
Identifying domain expertise of developers from source code
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Research of Spam Filtering System Based on LSA and SHA
ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks, Part II
A machine learning approach for tracing regulatory codes to product specific requirements
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Towards mining replacement queries for hard-to-retrieve traces
Proceedings of the IEEE/ACM international conference on Automated software engineering
Experiences with text mining large collections of unstructured systems development artifacts at jpl
Proceedings of the 33rd International Conference on Software Engineering
Applying a dynamic threshold to improve cluster detection of LSI
Science of Computer Programming
Supervised categorization of JavaScript™ using program analysis features
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Analyzing and mining a code search engine usage log
Empirical Software Engineering
Hi-index | 0.00 |
Abstract: The paper describes the results of applying Latent Semantic Analysis (LSA), an advanced information retrieval method, to program source code and associated documentation. Latent semantic analysis is a corpus based statistical method for inducing and representing aspects of the meanings of words and passages (of natural language) reflective in their usage. This methodology is assessed for application to the domain of software components (i.e., source code and its accompanying documentation). Here LSA is used as the basis to cluster software components. This clustering is used to assist in the understanding of a nontrivial software system, namely a version of Mosaic. Applying latent semantic analysis to the domain of source code and internal documentation for the support of program understanding is a new application of this method and a departure from the normal application domain of natural language.