Better cross company defect prediction

Authors:
Fayola Peters;Tim Menzies;Andrian Marcus
Affiliations:
West Virginia University, USA;West Virginia University, USA;Wayne State University, USA
Venue:
Proceedings of the 10th Working Conference on Mining Software Repositories
Year:
2013

Citing 27
Cited 0

Understanding and Controlling Software Costs

IEEE Transactions on Software Engineering
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Random Forests

Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
What We Have Learned About Fighting Defects

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Cross versus Within-Company Cost Estimation Studies: A Systematic Review

IEEE Transactions on Software Engineering
Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"

IEEE Transactions on Software Engineering
Techniques for evaluating fault prediction models

Empirical Software Engineering
Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

Empirical Software Engineering
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
Practical considerations in deploying AI for defect prediction: a case study within the Turkish telecommunication industry

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Data clustering: 50 years beyond K-means

Pattern Recognition Letters
Web-scale k-means clustering

Proceedings of the 19th international conference on World wide web
When to use data from other projects for effort estimation

Proceedings of the IEEE/ACM international conference on Automated software engineering
Towards identifying software project clusters with regard to defect prediction

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Using Faults-Slip-Through Metric as a Predictor of Fault-Proneness

APSEC '10 Proceedings of the 2010 Asia Pacific Software Engineering Conference
How to Find Relevant Data for Effort Estimation?

ESEM '11 Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement
Transfer learning for cross-company software defect prediction

Information and Software Technology
Guest editorial: learning to organize testing

Automated Software Engineering
An investigation on the feasibility of cross-project defect prediction

Automated Software Engineering
Local vs. global models for effort estimation and defect prediction

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Nearest neighbor pattern classification

IEEE Transactions on Information Theory
Recalling the "imprecision" of cross-project defect prediction

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

How can we find data for quality prediction? Early in the life cycle, projects may lack the data needed to build such predictors. Prior work assumed that relevant training data was found nearest to the local project. But is this the best approach? This paper introduces the Peters filter which is based on the following conjecture: When local data is scarce, more information exists in other projects. Accordingly, this filter selects training data via the structure of other projects. To assess the performance of the Peters filter, we compare it with two other approaches for quality prediction. Within- company learning and cross-company learning with the Burak filter (the state-of-the-art relevancy filter). This paper finds that: 1) within-company predictors are weak for small data-sets; 2) the Peters filter+cross-company builds better predictors than both within-company and the Burak filter+cross-company; and 3) the Peters filter builds 64% more useful predictors than both within- company and the Burak filter+cross-company approaches. Hence, we recommend the Peters filter for cross-company learning.