Categorizing software applications for maintenance

Authors:
Collin McMillan;Mario Linares-Vasquez;Denys Poshyvanyk;Mark Grechanik
Affiliations:
Department of Computer Science, The College of William and Mary, Williamsburg, Virginia, USA;Department of Computer Science, Universidad Nacional de Colombia, Bogotá, Colombia;Department of Computer Science, The College of William and Mary, Williamsburg, Virginia, USA;Accenture Technology Labs and The University of Illinois at Chicago, USA
Venue:
ICSM '11 Proceedings of the 2011 27th IEEE International Conference on Software Maintenance
Year:
2011

Citing 0
Cited 3

Detecting similar software applications

Proceedings of the 34th International Conference on Software Engineering
Labeled topic detection of open source software from mining mass textual project profiles

Proceedings of the First International Workshop on Software Mining
Tag recommendation for open source software

Frontiers of Computer Science: Selected Publications from Chinese Universities

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software repositories hold applications that are often categorized to improve the effectiveness of various maintenance tasks. Properly categorized applications allow stakeholders to identify requirements related to their applications and predict maintenance problems in software projects. Unfortunately, for different legal and organizational reasons the source code is often not available, thus making it difficult to automatically categorize binary executables of software applications. In this paper, we propose a novel approach in which we use Application Programming Interface (API) calls from third-party libraries as attributes for automatic categorization of software applications that use these API calls. API calls can be extracted from source code and more importantly, from the byte-code of applications, thus making automatic categorization approaches applicable to closed source repositories. We evaluate our approach along with other machine learning algorithms for software categorization on two large Java repositories: an open-source repository containing 3,286 projects and a closed-source one with 745 applications. Our contribution is twofold: not only do we propose a new approach that makes it possible to categorize software projects without any source code using a small number of API calls as attributes, but also we carried out the first comprehensive empirical evaluation of automatic categorization approaches.