MUDABlue: an automatic categorization system for open source repositories
Journal of Systems and Software - Special issue: Selected papers from the 11th Asia Pacific software engineering conference (APSEC 2004)
Semantic clustering: Identifying topics in source code
Information and Software Technology
Mining business topics in source code using latent dirichlet allocation
ISEC '08 Proceedings of the 1st India software engineering conference
A theory of aspects as latent topics
Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Sourcerer: mining and searching internet-scale software repositories
Data Mining and Knowledge Discovery
Classification of software artifacts based on structural information
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
Clustering methodologies for software engineering
Advances in Software Engineering
Hi-index | 0.00 |
Open Source communities typically use a software repository to archive various software projects with their source code, mailing list discussions, documentation, bug reports, and so forth. For example, SourceForge currently hosts over seventy thousand Open Source software systems. Because of the size of the rich information content, such repositories offer numerous opportunities for sharing information among projects. For example, one would like to know a set of projects that are related or similar to each other, so that the project groups can collaborate and share their work. With thousands of projects in typical repositories, however, manually locating related projects can be difficult. Hence, we propose MUDABlue, a tool that automatically categorizes software systems. MUDABlue has three major aspects: 1) it relies on no other information than the source code, 2) it determines category sets automatically, and 3) it allows a software system to be a member of multiple categories. MUDABlue has a web interface to visualize determined categories, which eases browsing a software repository. We show the effectiveness of MUDABlue's categorization capability by comparing its generated categories with that of some other existing research tools.