Automatic categorization of bug reports using latent Dirichlet allocation

  • Authors:
  • Kalyanasundaram Somasundaram;Gail C. Murphy

  • Affiliations:
  • Anna University;University of British Columbia

  • Venue:
  • Proceedings of the 5th India Software Engineering Conference
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software developers, particularly in open-source projects, rely on bug repositories to organize their work. On a bug report, the component field is used to indicate to which team of developers a bug should be routed. Researchers have shown that incorrect categorization of newly received bug reports to components can cause potential delays in the resolution of bug reports. Approaches have been developed that consider the use of machine learning approaches, specifically Support Vector Machines (svm), to automatically categorize bug reports into the appropriate component to help streamline the process of solving a bug. One drawback of an SVM-based approach is that the results of categorization can be uneven across various components in the system if some components receive less reports than others. In this paper, we consider broadening the consistency of the recommendations produced by an automatic approach by investigating three approaches to automating bug report categorization: an approach similar to previous ones based on an SVM classifier and Term Frequency Inverse Document Frequency(svm-tf-idf), an approach using Latent Dirichlet Allocation (LDA) with SVM (svm-lda) and an approach using LDA and Kullback Leibler divergence (lda-kl). We found that lda-kl produced recalls similar to those found previously but with better consistency across all components for which bugs must be categorized.