MUDABlue: an automatic categorization system for open source repositories
Journal of Systems and Software - Special issue: Selected papers from the 11th Asia Pacific software engineering conference (APSEC 2004)
An empirical study of rules for well-formed identifiers: Research Articles
Journal of Software Maintenance and Evolution: Research and Practice - Source Code Analysis and Manipulation (SCAM 2006)
Quantifying identifier quality: an analysis of trends
Empirical Software Engineering
Classification of software artifacts based on structural information
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
PROFES'05 Proceedings of the 6th international conference on Product Focused Software Process Improvement
Hi-index | 0.00 |
The number of software systems is increasing at a rapid rate. For example, SourceForge currently has about sixty thousand software systems registered, twenty-two thousand of which were added in the past twelve months. It is important for software evolution to search and use existing similar software systems from software archive. An evolution history of an existing similar software system is useful. We may even evolve a software system based on an existing one instead of creating it from scratch. In this paper, we propose automatic software categorization algorithm to help finding similar software systems in software archive. At present, we leave open the issue about the nature of the categorization, and explore several known approaches including code clones-based similarity metric, decision trees, and latent semantic analysis. The results from applying each of the approaches gives us some insights into the problem space, and sets some directions for further work.