A Validation of Object-Oriented Design Metrics as Quality Indicators
IEEE Transactions on Software Engineering
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Predicting the Location and Number of Faults in Large Software Systems
IEEE Transactions on Software Engineering
Mining metrics to predict component failures
Proceedings of the 28th international conference on Software engineering
Data Mining Static Code Attributes to Learn Defect Predictors
IEEE Transactions on Software Engineering
Proceedings of the 30th international conference on Software engineering
Adapting a fault prediction model to allow inter languagereuse
Proceedings of the 4th international workshop on Predictor models in software engineering
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
On the relative value of cross-company and within-company data for defect prediction
Empirical Software Engineering
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Towards logistic regression models for predicting fault-prone code across software projects
ESEM '09 Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement
Comparing the effectiveness of several modeling methods for fault prediction
Empirical Software Engineering
Towards identifying software project clusters with regard to defect prediction
Proceedings of the 6th International Conference on Predictive Models in Software Engineering
An investigation on the feasibility of cross-project defect prediction
Automated Software Engineering
Local vs. global models for effort estimation and defect prediction
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Recalling the "imprecision" of cross-project defect prediction
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Multi-objective Cross-Project Defect Prediction
ICST '13 Proceedings of the 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation
Hi-index | 0.00 |
Software defect prediction has been a popular research topic in recent years and is considered as a means for the optimization of quality assurance activities. Defect prediction can be done in a within-project or a cross-project scenario. The within-project scenario produces results with a very high quality, but requires historic data of the project, which is often not available. For the cross-project prediction, the data availability is not an issue as data from other projects is readily available, e.g., in repositories like PROMISE. However, the quality of the defect prediction results is too low for practical use. Recent research showed that the selection of appropriate training data can improve the quality of cross-project defect predictions. In this paper, we propose distance-based strategies for the selection of training data based on distributional characteristics of the available data. We evaluate the proposed strategies in a large case study with 44 data sets obtained from 14 open source projects. Our results show that our training data selection strategy improves the achieved success rate of cross-project defect predictions significantly. However, the quality of the results still cannot compete with within-project defect prediction.