Recalling the "imprecision" of cross-project defect prediction
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Empirical evaluation of the effects of mixed project data on learning defect predictors
Information and Software Technology
How, and why, process metrics are better
Proceedings of the 2013 International Conference on Software Engineering
Hi-index | 0.00 |
Defect prediction research mostly focus on optimizing the performance of models that are constructed for isolated projects. On the other hand, recent studies try to utilize data across projects for building defect prediction models. We combine both approaches and investigate the effects of using mixed (i.e. within and cross) project data on defect prediction performance, which has not been addressed in previous studies. We conduct experiments to analyze models learned from mixed project data using ten proprietary projects from two different organizations. We observe that code metric based mixed project models yield only minor improvements in the prediction performance for a limited number of cases that are difficult to characterize. Based on existing studies and our results, we conclude that using cross project data for defect prediction is still an open challenge that should only be considered in environments where there is no local data collection activity, and using data from other projects in addition to a project's own data does not pay off in terms of performance.