Empirical evaluation of the effects of mixed project data on learning defect predictors

Authors:
Burak Turhan;Ayşe Tosun Mısırlı;Ayşe Bener
Affiliations:
Dept. of Information Processing Science, University of Oulu, 90014 Oulu, Finland;Dept. of Information Processing Science, University of Oulu, 90014 Oulu, Finland;Ted Rogers School of ITM, Ryerson University, Toronto, ON, Canada M5B 2K3
Venue:
Information and Software Technology
Year:
2013

Citing 32
Cited 1

Object-oriented metrics: measures of complexity

Object-oriented metrics: measures of complexity
Building Knowledge through Families of Experiments

IEEE Transactions on Software Engineering
Experimentation in software engineering: an introduction

Experimentation in software engineering: an introduction
A Hierarchical Model for Object-Oriented Design Quality Assessment

IEEE Transactions on Software Engineering
Software Cost Estimation with Cocomo II with Cdrom

Software Cost Estimation with Cocomo II with Cdrom
Organizational Benchmarking Using the ISBSG Data Repository

IEEE Software
A Metrics Suite for Object Oriented Design

IEEE Transactions on Software Engineering
Assessing the applicability of fault-proneness models across object-oriented software projects

IEEE Transactions on Software Engineering
An Empirical Study on Object-Oriented Metrics

METRICS '99 Proceedings of the 6th International Symposium on Software Metrics
Identifying Similar Code with Program Dependence Graphs

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
How Much Software Quality Investment Is Enough: A Value-Based Approach

IEEE Software
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Cross versus Within-Company Cost Estimation Studies: A Systematic Review

IEEE Transactions on Software Engineering
Systematic review: A systematic review of effect size in software engineering experiments

Information and Software Technology
A Complexity Measure

IEEE Transactions on Software Engineering
Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"

IEEE Transactions on Software Engineering
Top 10 algorithms in data mining

Knowledge and Information Systems
Implications of ceiling effects in defect predictors

Proceedings of the 4th international workshop on Predictor models in software engineering
Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

Empirical Software Engineering
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering
Towards logistic regression models for predicting fault-prone code across software projects

ESEM '09 Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement
Support planning and controlling of early quality assurance by combining expert judgment and defect data--a case study

Empirical Software Engineering
Practical considerations in deploying statistical methods for defect prediction: A case study within the Turkish telecommunications industry

Information and Software Technology
Towards identifying software project clusters with regard to defect prediction

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Evolutionary Optimization of Software Quality Modeling with Multiple Repositories

IEEE Transactions on Software Engineering
Sharing experiments using open-source software

Software—Practice & Experience
Empirical Evaluation of Mixed-Project Defect Prediction Models

SEAA '11 Proceedings of the 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications
Regularities in learning defect predictors

PROFES'10 Proceedings of the 11th international conference on Product-Focused Software Process Improvement
Comparing the performance of fault prediction models which report multiple performance measures: recomputing the confusion matrix

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
A Systematic Literature Review on Fault Prediction Performance in Software Engineering

IEEE Transactions on Software Engineering

Data science for software engineering

Proceedings of the 2013 International Conference on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Context: Defect prediction research mostly focus on optimizing the performance of models that are constructed for isolated projects (i.e. within project (WP)) through retrospective analyses. On the other hand, recent studies try to utilize data across projects (i.e. cross project (CP)) for building defect prediction models for new projects. There are no cases where the combination of within and cross (i.e. mixed) project data are used together. Objective: Our goal is to investigate the merits of using mixed project data for binary defect prediction. Specifically, we want to check whether it is feasible, in terms of defect detection performance, to use data from other projects for the cases (i) when there is an existing within project history and (ii) when there are limited within project data. Method: We use data from 73 versions of 41 projects that are publicly available. We simulate the two above-mentioned cases, and compare the performances of naive Bayes classifiers by using within project data vs. mixed project data. Results: For the first case, we find that the performance of mixed project predictors significantly improves over full within project predictors (p-value