Detecting similar software applications
Proceedings of the 34th International Conference on Software Engineering
Labeled topic detection of open source software from mining mass textual project profiles
Proceedings of the First International Workshop on Software Mining
Tag recommendation for open source software
Frontiers of Computer Science: Selected Publications from Chinese Universities
Hi-index | 0.00 |
Software repositories hold applications that are often categorized to improve the effectiveness of various maintenance tasks. Properly categorized applications allow stakeholders to identify requirements related to their applications and predict maintenance problems in software projects. Unfortunately, for different legal and organizational reasons the source code is often not available, thus making it difficult to automatically categorize binary executables of software applications. In this paper, we propose a novel approach in which we use Application Programming Interface (API) calls from third-party libraries as attributes for automatic categorization of software applications that use these API calls. API calls can be extracted from source code and more importantly, from the byte-code of applications, thus making automatic categorization approaches applicable to closed source repositories. We evaluate our approach along with other machine learning algorithms for software categorization on two large Java repositories: an open-source repository containing 3,286 projects and a closed-source one with 745 applications. Our contribution is twofold: not only do we propose a new approach that makes it possible to categorize software projects without any source code using a small number of API calls as attributes, but also we carried out the first comprehensive empirical evaluation of automatic categorization approaches.