Dealing with noise in defect prediction

Authors:
Sunghun Kim;Hongyu Zhang;Rongxin Wu;Liang Gong
Affiliations:
Hong Kong University of Science and Technology, Hong Kong, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China
Venue:
Proceedings of the 33rd International Conference on Software Engineering
Year:
2011

Citing 29
Cited 12

Software Cost Estimation with Incomplete Data

IEEE Transactions on Software Engineering
Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods

IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Knowledge Acquisition from Databases

Knowledge Acquisition from Databases
A Metrics Suite for Object Oriented Design

IEEE Transactions on Software Engineering
Identifying Reasons for Software Changes Using Historic Databases

ICSM '00 Proceedings of the International Conference on Software Maintenance (ICSM'00)
Open-Source Change Logs

Empirical Software Engineering
The Necessity of Assuring Quality in Software Measurement Data

METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
Noise Identification with the k-Means Algorithm

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques

IEEE Transactions on Software Engineering
Facilitating software evolution research with kenyon

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
When do changes induce fixes?

MSR '05 Proceedings of the 2005 international workshop on Mining software repositories
Class noise vs. attribute noise: a quantitative study of their impacts

Artificial Intelligence Review
Mining metrics to predict component failures

Proceedings of the 28th international conference on Software engineering
Tracking defect warnings across versions

Proceedings of the 2006 international workshop on Mining software repositories
Automatic Identification of Bug-Introducing Changes

ASE '06 Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Predicting Faults from Cached History

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Predicting Defects for Eclipse

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Learning from bug-introducing changes to prevent fault prone code

Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"

IEEE Transactions on Software Engineering
Predicting Defective Software Components from Code Complexity Measures

PRDC '07 Proceedings of the 13th Pacific Rim International Symposium on Dependable Computing
A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction

Proceedings of the 30th international conference on Software engineering
Predicting defects using network analysis on dependency graphs

Proceedings of the 30th international conference on Software engineering
Classifying Software Changes: Clean or Buggy?

IEEE Transactions on Software Engineering
Predicting faults using the complexity of code changes

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
The secret life of bugs: Going past the errors and omissions in software repositories

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Fair and balanced?: bias in bug-fix datasets

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Reducing Features to Improve Bug Prediction

ASE '09 Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering

ReLink: recovering links between bugs and changes

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Local vs. global models for effort estimation and defect prediction

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Can I clone this piece of code here?

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Method-level bug prediction

Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Recalling the "imprecision" of cross-project defect prediction

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Multi-layered approach for recovering links between bug reports and fixes

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Transfer defect learning

Proceedings of the 2013 International Conference on Software Engineering
It's not a bug, it's a feature: how misclassification impacts bug prediction

Proceedings of the 2013 International Conference on Software Engineering
The impact of tangled code changes

Proceedings of the 10th Working Conference on Mining Software Repositories
Sample size vs. bias in defect prediction

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
A cost-effectiveness criterion for applying software defect prediction models

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Is this a bug or an obsolete test?

ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many software defect prediction models have been built using historical defect data obtained by mining software repositories (MSR). Recent studies have discovered that data so collected contain noises because current defect collection practices are based on optional bug fix keywords or bug report links in change logs. Automatically collected defect data based on the change logs could include noises. This paper proposes approaches to deal with the noise in defect data. First, we measure the impact of noise on defect prediction models and provide guidelines for acceptable noise level. We measure noise resistant ability of two well-known defect prediction algorithms and find that in general, for large defect datasets, adding FP (false positive) or FN (false negative) noises alone does not lead to substantial performance differences. However, the prediction performance decreases significantly when the dataset contains 20%-35% of both FP and FN noises. Second, we propose a noise detection and elimination algorithm to address this problem. Our empirical study shows that our algorithm can identify noisy instances with reasonable accuracy. In addition, after eliminating the noises using our algorithm, defect prediction accuracy is improved.