Empirical evaluation of the effects of mixed project data on learning defect predictors

  • Authors:
  • Burak Turhan;Ayşe Tosun Mısırlı;Ayşe Bener

  • Affiliations:
  • Dept. of Information Processing Science, University of Oulu, 90014 Oulu, Finland;Dept. of Information Processing Science, University of Oulu, 90014 Oulu, Finland;Ted Rogers School of ITM, Ryerson University, Toronto, ON, Canada M5B 2K3

  • Venue:
  • Information and Software Technology
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Context: Defect prediction research mostly focus on optimizing the performance of models that are constructed for isolated projects (i.e. within project (WP)) through retrospective analyses. On the other hand, recent studies try to utilize data across projects (i.e. cross project (CP)) for building defect prediction models for new projects. There are no cases where the combination of within and cross (i.e. mixed) project data are used together. Objective: Our goal is to investigate the merits of using mixed project data for binary defect prediction. Specifically, we want to check whether it is feasible, in terms of defect detection performance, to use data from other projects for the cases (i) when there is an existing within project history and (ii) when there are limited within project data. Method: We use data from 73 versions of 41 projects that are publicly available. We simulate the two above-mentioned cases, and compare the performances of naive Bayes classifiers by using within project data vs. mixed project data. Results: For the first case, we find that the performance of mixed project predictors significantly improves over full within project predictors (p-value