Software Cost Estimation with Incomplete Data
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Knowledge Acquisition from Databases
Knowledge Acquisition from Databases
A Metrics Suite for Object Oriented Design
IEEE Transactions on Software Engineering
Identifying Reasons for Software Changes Using Historic Databases
ICSM '00 Proceedings of the International Conference on Software Maintenance (ICSM'00)
Empirical Software Engineering
The Necessity of Assuring Quality in Software Measurement Data
METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
Noise Identification with the k-Means Algorithm
ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Predicting the Location and Number of Faults in Large Software Systems
IEEE Transactions on Software Engineering
Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques
IEEE Transactions on Software Engineering
Facilitating software evolution research with kenyon
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
MSR '05 Proceedings of the 2005 international workshop on Mining software repositories
Class noise vs. attribute noise: a quantitative study of their impacts
Artificial Intelligence Review
Mining metrics to predict component failures
Proceedings of the 28th international conference on Software engineering
Tracking defect warnings across versions
Proceedings of the 2006 international workshop on Mining software repositories
Automatic Identification of Bug-Introducing Changes
ASE '06 Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering
Data Mining Static Code Attributes to Learn Defect Predictors
IEEE Transactions on Software Engineering
Predicting Faults from Cached History
ICSE '07 Proceedings of the 29th international conference on Software Engineering
Predicting Defects for Eclipse
PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Learning from bug-introducing changes to prevent fault prone code
Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"
IEEE Transactions on Software Engineering
Predicting Defective Software Components from Code Complexity Measures
PRDC '07 Proceedings of the 13th Pacific Rim International Symposium on Dependable Computing
Proceedings of the 30th international conference on Software engineering
Predicting defects using network analysis on dependency graphs
Proceedings of the 30th international conference on Software engineering
Classifying Software Changes: Clean or Buggy?
IEEE Transactions on Software Engineering
Predicting faults using the complexity of code changes
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
The secret life of bugs: Going past the errors and omissions in software repositories
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Fair and balanced?: bias in bug-fix datasets
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Reducing Features to Improve Bug Prediction
ASE '09 Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering
ReLink: recovering links between bugs and changes
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Local vs. global models for effort estimation and defect prediction
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Can I clone this piece of code here?
Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Recalling the "imprecision" of cross-project defect prediction
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Multi-layered approach for recovering links between bug reports and fixes
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Proceedings of the 2013 International Conference on Software Engineering
It's not a bug, it's a feature: how misclassification impacts bug prediction
Proceedings of the 2013 International Conference on Software Engineering
The impact of tangled code changes
Proceedings of the 10th Working Conference on Mining Software Repositories
Sample size vs. bias in defect prediction
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
A cost-effectiveness criterion for applying software defect prediction models
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Is this a bug or an obsolete test?
ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Hi-index | 0.00 |
Many software defect prediction models have been built using historical defect data obtained by mining software repositories (MSR). Recent studies have discovered that data so collected contain noises because current defect collection practices are based on optional bug fix keywords or bug report links in change logs. Automatically collected defect data based on the change logs could include noises. This paper proposes approaches to deal with the noise in defect data. First, we measure the impact of noise on defect prediction models and provide guidelines for acceptable noise level. We measure noise resistant ability of two well-known defect prediction algorithms and find that in general, for large defect datasets, adding FP (false positive) or FN (false negative) noises alone does not lead to substantial performance differences. However, the prediction performance decreases significantly when the dataset contains 20%-35% of both FP and FN noises. Second, we propose a noise detection and elimination algorithm to address this problem. Our empirical study shows that our algorithm can identify noisy instances with reasonable accuracy. In addition, after eliminating the noises using our algorithm, defect prediction accuracy is improved.