Software traceability with topic modeling
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Mining software repositories using topic models
Proceedings of the 33rd International Conference on Software Engineering
Applying a dynamic threshold to improve cluster detection of LSI
Science of Computer Programming
Proceedings of the 50th Annual Southeast Regional Conference
Analyzing and mining a code search engine usage log
Empirical Software Engineering
Labeled topic detection of open source software from mining mass textual project profiles
Proceedings of the First International Workshop on Software Mining
Extraction of product evolution tree from source code of product variants
Proceedings of the 17th International Software Product Line Conference
Mining and recommending software features across multiple web repositories
Proceedings of the 5th Asia-Pacific Symposium on Internetware
Studying software evolution using topic models
Science of Computer Programming
Tag recommendation for open source software
Frontiers of Computer Science: Selected Publications from Chinese Universities
Hi-index | 0.01 |
In this paper, we propose a technique called LACT for automatically categorizing software systems in open-source repositories. LACT is based on Latent Dirichlet Allocation, an information retrieval method which is used to index and analyze source code documents as mixtures of probabilistic topics. For an initial evaluation, we performed two studies. In the first study, LACT was compared against an existing tool, MUDABlue, for classifying 41 software systems written in C into problem domain categories. The results indicate that LACT can automatically produce meaningful category names and yield classification results comparable to MUDABlue. In the second study, we applied LACT to 43 software systems written in different programming languages such as C/C++, Java, C#, PHP, and Perl. The results indicate that LACT can be used effectively for the automatic categorization of software systems regardless of the underlying programming language or paradigm. Moreover, both studies indicate that LACT can identify several new categories that are based on libraries, architectures, or programming languages, which is a promising improvement as compared to manual categorization and existing techniques.