Automatic categorization of bug reports using latent Dirichlet allocation

Authors:
Kalyanasundaram Somasundaram;Gail C. Murphy
Affiliations:
Anna University;University of British Columbia
Venue:
Proceedings of the 5th India Software Engineering Conference
Year:
2012

Citing 6
Cited 6

An Approach to Classify Software Maintenance Requests

ICSM '02 Proceedings of the International Conference on Software Maintenance (ICSM'02)
Latent dirichlet allocation

The Journal of Machine Learning Research
Who should fix this bug?

Proceedings of the 28th international conference on Software engineering
Supporting change request assignment in open source development

Proceedings of the 2006 ACM symposium on Applied computing
"Not my bug!" and other reasons for software bug report reassignments

Proceedings of the ACM 2011 conference on Computer supported cooperative work
Reducing the effort of bug report triage: Recommenders for development-oriented decisions

ACM Transactions on Software Engineering and Methodology (TOSEM)

DRETOM: developer recommendation based on topic models for bug resolution

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
An empirical study on developer interactions in StackOverflow

Proceedings of the 28th Annual ACM Symposium on Applied Computing
PorchLight: a tag-based approach to bug triaging

Proceedings of the 2013 International Conference on Software Engineering
Categorizing bugs with social networks: a case study on four open source software communities

Proceedings of the 2013 International Conference on Software Engineering
Bug report assignee recommendation using activity profiles

Proceedings of the 10th Working Conference on Mining Software Repositories
Tag recommendation for open source software

Frontiers of Computer Science: Selected Publications from Chinese Universities

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software developers, particularly in open-source projects, rely on bug repositories to organize their work. On a bug report, the component field is used to indicate to which team of developers a bug should be routed. Researchers have shown that incorrect categorization of newly received bug reports to components can cause potential delays in the resolution of bug reports. Approaches have been developed that consider the use of machine learning approaches, specifically Support Vector Machines (svm), to automatically categorize bug reports into the appropriate component to help streamline the process of solving a bug. One drawback of an SVM-based approach is that the results of categorization can be uneven across various components in the system if some components receive less reports than others. In this paper, we consider broadening the consistency of the recommendations produced by an automatic approach by investigating three approaches to automating bug report categorization: an approach similar to previous ones based on an SVM classifier and Term Frequency Inverse Document Frequency(svm-tf-idf), an approach using Latent Dirichlet Allocation (LDA) with SVM (svm-lda) and an approach using LDA and Kullback Leibler divergence (lda-kl). We found that lda-kl produced recalls similar to those found previously but with better consistency across all components for which bugs must be categorized.