An Empirical Comparison of Machine Learning Techniques in Predicting the Bug Severity of Open and Closed Source Projects

Authors:
K. K. Chaturvedi;V.B. Singh
Affiliations:
Indian Agricultural Statistics Research Institute, New Delhi, Delhi, India;Delhi College of Arts & Commerce, University of Delhi, New Delhi, Delhi, India
Venue:
International Journal of Open Source Software and Processes
Year:
2012

Citing 44
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
C4.5: programs for machine learning

C4.5: programs for machine learning
The nature of statistical learning theory

The nature of statistical learning theory
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Machine Learning

Machine Learning
Random Forests

Machine Learning
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Lightweight Document Matching for Help-Desk Applications

IEEE Intelligent Systems
Induction of Decision Trees

Machine Learning
Leightweight Document Clustering

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
A system for real-time competitive market intelligence

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Text categorization for a comprehensive time-dependent benchmark

Information Processing and Management: an International Journal
Text Mining: Predictive Methods for Analyzing Unstructured Information

Text Mining: Predictive Methods for Analyzing Unstructured Information
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Who should fix this bug?

Proceedings of the 28th international conference on Software engineering
Text Mining Application Programming (Programming Series)

Text Mining Application Programming (Programming Series)
YALE: rapid prototyping for complex data mining tasks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A Linguistic Analysis of How People Describe Software Problems

VLHCC '06 Proceedings of the Visual Languages and Human-Centric Computing
Introduction to Data Mining Using SAS Enterprise Miner

Introduction to Data Mining Using SAS Enterprise Miner
Detection of Duplicate Defect Reports Using Natural Language Processing

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Modeling bug report quality

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
An approach to detecting duplicate bug reports using natural language and execution information

Proceedings of the 30th international conference on Software engineering
Towards a simplification of the bug report form in eclipse

Proceedings of the 2008 international working conference on Mining software repositories
What makes a good bug report?

Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
Opinion Mining and Sentiment Analysis

Foundations and Trends in Information Retrieval
Is it a bug or an enhancement?: a text-based approach to classify change requests

CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
Guidelines for conducting and reporting case study research in software engineering

Empirical Software Engineering
Predictive Modeling With SAS Enterprise Miner: Practical Solutions for Business Applications

Predictive Modeling With SAS Enterprise Miner: Practical Solutions for Business Applications
Mining the coherence of GNOME bug reports with statistical topic models

MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Assigning bug reports using a vocabulary-based expertise model of developers

MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Improving bug triage with bug tossing graphs

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging

ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
Detecting Duplicate Bug Report Using Character N-Gram-Based Features

APSEC '10 Proceedings of the 2010 Asia Pacific Software Engineering Conference
Predicting defect priority based on neural networks

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Comparing Mining Algorithms for Predicting the Severity of a Reported Bug

CSMR '11 Proceedings of the 2011 15th European Conference on Software Maintenance and Reengineering
A tale of two browsers

Proceedings of the 8th Working Conference on Mining Software Repositories
Detecting bug duplicate reports through local references

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Fuzzy set and cache-based approach for bug triaging

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks
Repositories with Public Data about Software Development

International Journal of Open Source Software and Processes

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bug severity is the degree of impact that a defect has on the development or operation of a component or system, and can be classified into different levels based on their impact on the system. Identification of severity level can be useful for bug triager in allocating the bug to the concerned bug fixer. Various researchers have attempted text mining techniques in predicting the severity of bugs, detection of duplicate bug reports and assignment of bugs to suitable fixer for its fix. In this paper, an attempt has been made to compare the performance of different machine learning techniques namely Support vector machine SVM, probability based Naïve Bayes NB, Decision Tree based J48 A Java implementation of C4.5, rule based Repeated Incremental Pruning to Produce Error Reduction RIPPER and Random Forests RF learners in predicting the severity level 1 to 5 of a reported bug by analyzing the summary or short description of the bug reports. The bug report data has been taken from NASA's PITS Projects and Issue Tracking System datasets as closed source and components of Eclipse, Mozilla & GNOME datasets as open source projects. The analysis has been carried out in RapidMiner and STATISTICA data mining tools. The authors measured the performance of different machine learning techniques by considering i the value of accuracy and F-Measure for all severity level and ii number of best cases at different threshold level of accuracy and F-Measure.