An investigation on the feasibility of cross-project defect prediction

Authors:
Zhimin He;Fengdi Shu;Ye Yang;Mingshu Li;Qing Wang
Affiliations:
Laboratory for Internet Software Technologies, Institute of Software Chinese Academy of Sciences, Beijing, China 100190 and Graduate University Chinese Academy of Sciences, Beijing, China 100190;Laboratory for Internet Software Technologies, Institute of Software Chinese Academy of Sciences, Beijing, China 100190;Laboratory for Internet Software Technologies, Institute of Software Chinese Academy of Sciences, Beijing, China 100190;Laboratory for Internet Software Technologies, Institute of Software Chinese Academy of Sciences, Beijing, China 100190 and State Key Laboratory of Computer Science, Institute of Software, Chinese ...;Laboratory for Internet Software Technologies, Institute of Software Chinese Academy of Sciences, Beijing, China 100190
Venue:
Automated Software Engineering
Year:
2012

Citing 33
Cited 3

C4.5: programs for machine learning

C4.5: programs for machine learning
Predicting Fault-Prone Software Modules in Telephone Switches

IEEE Transactions on Software Engineering
A Metrics Suite for Object Oriented Design

IEEE Transactions on Software Engineering
Quantitative Analysis of Faults and Failures in a Complex Software System

IEEE Transactions on Software Engineering
Use of relative code churn measures to predict system defect density

Proceedings of the 27th international conference on Software engineering
Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
The Top Ten List: Dynamic Fault Prediction

ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
Building Defect Prediction Models in Practice

IEEE Software
Mining metrics to predict component failures

Proceedings of the 28th international conference on Software engineering
Enhancing software quality estimation using ensemble-classifier based noise filtering

Intelligent Data Analysis
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"

IEEE Transactions on Software Engineering
Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"

IEEE Transactions on Software Engineering
A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction

Proceedings of the 30th international conference on Software engineering
Adapting a fault prediction model to allow inter languagereuse

Proceedings of the 4th international workshop on Predictor models in software engineering
Implications of ceiling effects in defect predictors

Proceedings of the 4th international workshop on Predictor models in software engineering
Techniques for evaluating fault prediction models

Empirical Software Engineering
Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

Empirical Software Engineering
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
Review: A systematic review of software fault prediction studies

Expert Systems with Applications: An International Journal
Practical considerations in deploying AI for defect prediction: a case study within the Turkish telecommunication industry

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering
Knowledge discovery from imbalanced and noisy data

Data & Knowledge Engineering
Evolutionary data analysis for the class imbalance problem

Intelligent Data Analysis
A symbolic fault-prediction model based on multiobjective particle swarm optimization

Journal of Systems and Software
Comparing the effectiveness of several modeling methods for fault prediction

Empirical Software Engineering
What can fault prediction do for you?

TAP'08 Proceedings of the 2nd international conference on Tests and proofs
Defect prediction from static code features: current results, limitations, new approaches

Automated Software Engineering
When to use data from other projects for effort estimation

Proceedings of the IEEE/ACM international conference on Automated software engineering
Towards identifying software project clusters with regard to defect prediction

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
A framework for defect prediction in specific software project contexts

CEE-SET'08 Proceedings of the Third IFIP TC 2 Central and East European conference on Software engineering techniques
Regularities in learning defect predictors

PROFES'10 Proceedings of the 11th international conference on Product-Focused Software Process Improvement

Recalling the "imprecision" of cross-project defect prediction

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Better cross company defect prediction

Proceedings of the 10th Working Conference on Mining Software Repositories
Training data selection for cross-project defect prediction

Proceedings of the 9th International Conference on Predictive Models in Software Engineering

Quantified Score

Hi-index	0.02

Visualization

Abstract

Software defect prediction helps to optimize testing resources allocation by identifying defect-prone modules prior to testing. Most existing models build their prediction capability based on a set of historical data, presumably from the same or similar project settings as those under prediction. However, such historical data is not always available in practice. One potential way of predicting defects in projects without historical data is to learn predictors from data of other projects. This paper investigates defect predictions in the cross-project context focusing on the selection of training data. We conduct three large-scale experiments on 34 data sets obtained from 10 open source projects. Major conclusions from our experiments include: (1) in the best cases, training data from other projects can provide better prediction results than training data from the same project; (2) the prediction results obtained using training data from other projects meet our criteria for acceptance on the average level, defects in 18 out of 34 cases were predicted at a Recall greater than 70% and a Precision greater than 50%; (3) results of cross-project defect predictions are related with the distributional characteristics of data sets which are valuable for training data selection. We further propose an approach to automatically select suitable training data for projects without historical data. Prediction results provided by the training data selected by using our approach are comparable with those provided by training data from the same project.