Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Authors:
Thomas Zimmermann;Nachiappan Nagappan;Harald Gall;Emanuel Giger;Brendan Murphy
Affiliations:
Microsoft Research, Redmond, WA, USA;Microsoft Research, Redmond, WA, USA;University of Zurich, Zurich, Switzerland;University of Zurich, Zurich, USA;Microsoft Research, Cambridge, United Kingdom
Venue:
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Year:
2009

Citing 31
Cited 39

The Detection of Fault-Prone Programs

IEEE Transactions on Software Engineering
A Validation of Object-Oriented Design Metrics as Quality Indicators

IEEE Transactions on Software Engineering
Investigating quality factors in object-oriented designs: an industrial case study

Proceedings of the 21st international conference on Software engineering
Building Knowledge through Families of Experiments

IEEE Transactions on Software Engineering
A replicated assessment and comparison of common software cost modeling techniques

Proceedings of the 22nd international conference on Software engineering
Predicting Fault Incidence Using Software Change History

IEEE Transactions on Software Engineering
Software Cost Estimation with Cocomo II with Cdrom

Software Cost Estimation with Cocomo II with Cdrom
A Metrics Suite for Object Oriented Design

IEEE Transactions on Software Engineering
Quantitative Analysis of Faults and Failures in a Complex Software System

IEEE Transactions on Software Engineering
How Valuable is company-specific Data Compared to multi-company Data for Software Cost Estimation?

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Code Churn: A Measure for Estimating the Impact of Code Change

ICSM '98 Proceedings of the International Conference on Software Maintenance
Where the bugs are

ISSTA '04 Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
Further Comparison of Cross-Company and Within-Company Effort Estimation Models for Web Applications

METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
Predictors of customer perceived software quality

Proceedings of the 27th international conference on Software engineering
Use of relative code churn measures to predict system defect density

Proceedings of the 27th international conference on Software engineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Controlling Software Projects: Management, Measurement, and Estimates

Controlling Software Projects: Management, Measurement, and Estimates
Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction

IEEE Transactions on Software Engineering
Mining metrics to predict component failures

Proceedings of the 28th international conference on Software engineering
Predicting defect densities in source code files with decision tree learners

Proceedings of the 2006 international workshop on Mining software repositories
Using Historical In-Process and Product Metrics for Early Estimation of Software Failures

ISSRE '06 Proceedings of the 17th International Symposium on Software Reliability Engineering
Knowledge Discovery and Data Mining: Challenges and Realities

Knowledge Discovery and Data Mining: Challenges and Realities
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Cross versus Within-Company Cost Estimation Studies: A Systematic Review

IEEE Transactions on Software Engineering
Predicting Defects for Eclipse

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Using Software Dependencies and Churn Metrics to Predict Field Failures: An Empirical Case Study

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation

IEEE Transactions on Software Engineering
The influence of organizational structure on software quality: an empirical case study

Proceedings of the 30th international conference on Software engineering
Tracking concept drift of software projects using defect prediction quality

MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Fair and balanced?: bias in bug-fix datasets

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering

Defect prediction from static code features: current results, limitations, new approaches

Automated Software Engineering
Practical considerations in deploying statistical methods for defect prediction: A case study within the Turkish telecommunications industry

Information and Software Technology
Towards identifying software project clusters with regard to defect prediction

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Analytics for software development

Proceedings of the FSE/SDP workshop on Future of software engineering research
The case for software evolution

Proceedings of the FSE/SDP workshop on Future of software engineering research
Comparing fine-grained source code changes and code churn for bug prediction

Proceedings of the 8th Working Conference on Mining Software Repositories
Bug-fix time prediction models: can we do better?

Proceedings of the 8th Working Conference on Mining Software Repositories
The inductive software engineering manifesto: principles for industrial data mining

Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering
Transfer learning for cross-company software defect prediction

Information and Software Technology
Who tested my software? Testing as an organizationally cross-cutting activity

Software Quality Control
Sample-based software defect prediction with active and semi-supervised learning

Automated Software Engineering
Guest editorial: learning to organize testing

Automated Software Engineering
An investigation on the feasibility of cross-project defect prediction

Automated Software Engineering
The difficulties of building generic reliability models for software

Empirical Software Engineering
On the dataset shift problem in software engineering prediction models

Empirical Software Engineering
Special issue on repeatable results in software engineering prediction

Empirical Software Engineering
The situational factors that affect the software development process: Towards a comprehensive reference framework

Information and Software Technology
Idea: java vs. PHP: security implications of language choice for web applications

ESSoS'10 Proceedings of the Second international conference on Engineering Secure Software and Systems
Local vs. global models for effort estimation and defect prediction

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Faster issue resolution with higher technical quality of software

Software Quality Control
Evaluating defect prediction approaches: a benchmark and an extensive comparison

Empirical Software Engineering
Privacy and utility for defect prediction: experiments with MORPH

Proceedings of the 34th International Conference on Software Engineering
Bug prediction based on fine-grained module histories

Proceedings of the 34th International Conference on Software Engineering
Active refinement of clone anomaly reports

Proceedings of the 34th International Conference on Software Engineering
Defect, defect, defect: defect prediction 2.0

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Software mining and fault prediction

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Recalling the "imprecision" of cross-project defect prediction

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Studying the impact of social interactions on software quality

Empirical Software Engineering
Predicting method crashes with bytecode operations

Proceedings of the 6th India Software Engineering Conference
Predicting aging-related bugs using software complexity metrics

Performance Evaluation
Empirical evaluation of the effects of mixed project data on learning defect predictors

Information and Software Technology
A learning-based method for combining testing techniques

Proceedings of the 2013 International Conference on Software Engineering
Transfer defect learning

Proceedings of the 2013 International Conference on Software Engineering
How, and why, process metrics are better

Proceedings of the 2013 International Conference on Software Engineering
Better cross company defect prediction

Proceedings of the 10th Working Conference on Mining Software Repositories
Training data selection for cross-project defect prediction

Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Building a second opinion: learning cross-company data

Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Is this a bug or an obsolete test?

ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Data stream mining for predicting software build outcomes using source code metrics

Information and Software Technology

Quantified Score

Hi-index	0.01

Visualization

Abstract

Prediction of software defects works well within projects as long as there is a sufficient amount of data available to train any models. However, this is rarely the case for new software projects and for many companies. So far, only a few have studies focused on transferring prediction models from one project to another. In this paper, we study cross-project defect prediction models on a large scale. For 12 real-world applications, we ran 622 cross-project predictions. Our results indicate that cross-project prediction is a serious challenge, i.e., simply using models from projects in the same domain or with the same process does not lead to accurate predictions. To help software engineers choose models wisely, we identified factors that do influence the success of cross-project predictions. We also derived decision trees that can provide early estimates for precision, recall, and accuracy before a prediction is attempted.