Categorizing software applications for maintenance

  • Authors:
  • Collin McMillan;Mario Linares-Vasquez;Denys Poshyvanyk;Mark Grechanik

  • Affiliations:
  • Department of Computer Science, The College of William and Mary, Williamsburg, Virginia, USA;Department of Computer Science, Universidad Nacional de Colombia, Bogotá, Colombia;Department of Computer Science, The College of William and Mary, Williamsburg, Virginia, USA;Accenture Technology Labs and The University of Illinois at Chicago, USA

  • Venue:
  • ICSM '11 Proceedings of the 2011 27th IEEE International Conference on Software Maintenance
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software repositories hold applications that are often categorized to improve the effectiveness of various maintenance tasks. Properly categorized applications allow stakeholders to identify requirements related to their applications and predict maintenance problems in software projects. Unfortunately, for different legal and organizational reasons the source code is often not available, thus making it difficult to automatically categorize binary executables of software applications. In this paper, we propose a novel approach in which we use Application Programming Interface (API) calls from third-party libraries as attributes for automatic categorization of software applications that use these API calls. API calls can be extracted from source code and more importantly, from the byte-code of applications, thus making automatic categorization approaches applicable to closed source repositories. We evaluate our approach along with other machine learning algorithms for software categorization on two large Java repositories: an open-source repository containing 3,286 projects and a closed-source one with 745 applications. Our contribution is twofold: not only do we propose a new approach that makes it possible to categorize software projects without any source code using a small number of API calls as attributes, but also we carried out the first comprehensive empirical evaluation of automatic categorization approaches.