It's not a bug, it's a feature: how misclassification impacts bug prediction

Authors:
Kim Herzig;Sascha Just;Andreas Zeller
Affiliations:
Saarland University, Germany;Saarland University, Germany;Saarland University, Germany
Venue:
Proceedings of the 2013 International Conference on Software Engineering
Year:
2013

Citing 33
Cited 1

Hipikat: recommending pertinent software development artifacts

Proceedings of the 25th International Conference on Software Engineering
Populating a Release History Database from Version Control and Bug Tracking Systems

ICSM '03 Proceedings of the International Conference on Software Maintenance
Estimation of Software Defects Fix Effort Using Neural Networks

COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts - Volume 02
Hipikat: A Project Memory for Software Development

IEEE Transactions on Software Engineering
When do changes induce fixes?

MSR '05 Proceedings of the 2005 international workshop on Mining software repositories
Who should fix this bug?

Proceedings of the 28th international conference on Software engineering
Effort estimation by characterizing developer activity

Proceedings of the 2006 international workshop on Economics driven software engineering research
Predicting component failures at design time

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Predicting Faults from Cached History

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Detection of Duplicate Defect Reports Using Natural Language Processing

ICSE '07 Proceedings of the 29th international conference on Software Engineering
How Long Will It Take to Fix This Bug?

MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Predicting Defects for Eclipse

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Modeling bug report quality

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Extraction of bug localization benchmarks from history

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Quality of bug reports in Eclipse

Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange
An approach to detecting duplicate bug reports using natural language and execution information

Proceedings of the 30th international conference on Software engineering
Predicting defects using network analysis on dependency graphs

Proceedings of the 30th international conference on Software engineering
Data sets and data quality in software engineering

Proceedings of the 4th international workshop on Predictor models in software engineering
Classifying Software Changes: Clean or Buggy?

IEEE Transactions on Software Engineering
What makes a good bug report?

Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
Is it a bug or an enhancement?: a text-based approach to classify change requests

CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
Fair and balanced?: bias in bug-fix datasets

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Predicting the fix time of bugs

Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering
The missing links: bugs and bug-fix commits

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
LINKSTER: enabling efficient manual inspection and annotation of mined data

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Change Bursts as Defect Predictors

ISSRE '10 Proceedings of the 2010 IEEE 21st International Symposium on Software Reliability Engineering
A Case Study of Bias in Bug-Fix Datasets

WCRE '10 Proceedings of the 2010 17th Working Conference on Reverse Engineering
"Not my bug!" and other reasons for software bug report reassignments

Proceedings of the ACM 2011 conference on Computer supported cooperative work
Dealing with noise in defect prediction

Proceedings of the 33rd International Conference on Software Engineering
Reducing the effort of bug report triage: Recommenders for development-oriented decisions

ACM Transactions on Software Engineering and Methodology (TOSEM)
ReLink: recovering links between bugs and changes

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Network Versus Code Metrics to Predict Defects: A Replication Study

ESEM '11 Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement

An analysis of requirements evolution in open source projects: recommendations for issue trackers

Proceedings of the 2013 International Workshop on Principles of Software Evolution

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a manual examination of more than 7,000 issue reports from the bug databases of five open-source projects, we found 33.8% of all bug reports to be misclassified---that is, rather than referring to a code fix, they resulted in a new feature, an update to documentation, or an internal refactoring. This misclassification introduces bias in bug prediction models, confusing bugs and features: On average, 39% of files marked as defective actually never had a bug. We discuss the impact of this misclassification on earlier studies and recommend manual data validation for future studies.