Automatic Categorization Algorithm for Evolvable Software Archive

Authors:
Shinji Kawaguchi;Pankaj K. Garg;Makoto Matsushita;Katsuro Inoue
Affiliations:
-;-;-;-
Venue:
IWPSE '03 Proceedings of the 6th International Workshop on Principles of Software Evolution
Year:
2003

Citing 0
Cited 5

MUDABlue: an automatic categorization system for open source repositories

Journal of Systems and Software - Special issue: Selected papers from the 11th Asia Pacific software engineering conference (APSEC 2004)
An empirical study of rules for well-formed identifiers: Research Articles

Journal of Software Maintenance and Evolution: Research and Practice - Source Code Analysis and Manipulation (SCAM 2006)
Quantifying identifier quality: an analysis of trends

Empirical Software Engineering
Classification of software artifacts based on structural information

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
Mega software engineering

PROFES'05 Proceedings of the 6th international conference on Product Focused Software Process Improvement

Quantified Score

Hi-index	0.00

Visualization

Abstract

The number of software systems is increasing at a rapid rate. For example, SourceForge currently has about sixty thousand software systems registered, twenty-two thousand of which were added in the past twelve months. It is important for software evolution to search and use existing similar software systems from software archive. An evolution history of an existing similar software system is useful. We may even evolve a software system based on an existing one instead of creating it from scratch. In this paper, we propose automatic software categorization algorithm to help finding similar software systems in software archive. At present, we leave open the issue about the nature of the categorization, and explore several known approaches including code clones-based similarity metric, decision trees, and latent semantic analysis. The results from applying each of the approaches gives us some insights into the problem space, and sets some directions for further work.