IEEE Transactions on Software Engineering - Special issue on software reliability
Information Retrieval
Gravity based spatial clustering
Proceedings of the 10th ACM international symposium on Advances in geographic information systems
Improving SVM accuracy by training on auxiliary data sources
ICML '04 Proceedings of the twenty-first international conference on Machine learning
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Predicting software defects in varying development lifecycles using Bayesian nets
Information and Software Technology
Object-oriented software fault prediction using neural networks
Information and Software Technology
Data Mining Static Code Attributes to Learn Defect Predictors
IEEE Transactions on Software Engineering
Discriminative learning for differing training and test distributions
Proceedings of the 24th international conference on Machine learning
Boosting for transfer learning
Proceedings of the 24th international conference on Machine learning
A Comparative Study of Methods for Transductive Transfer Learning
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Can chinese web pages be classified with english data source?
Proceedings of the 17th international conference on World Wide Web
Implications of ceiling effects in defect predictors
Proceedings of the 4th international workshop on Predictor models in software engineering
Data gravitation based classification
Information Sciences: an International Journal
Information Sciences: an International Journal
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Domain adaptation with structural correspondence learning
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
On the relative value of cross-company and within-company data for defect prediction
Empirical Software Engineering
Transferring naive bayes classifiers for text classification
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Extending Semi-supervised Learning Methods for Inductive Transfer Learning
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Adaptive email spam filtering based on information theory
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Evolutionary Optimization of Software Quality Modeling with Multiple Repositories
IEEE Transactions on Software Engineering
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Improving nearest neighbor classification with simulated gravitational collapse
ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part III
IEEE Transactions on Neural Networks
Recalling the "imprecision" of cross-project defect prediction
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Proceedings of the 2013 International Conference on Software Engineering
How, and why, process metrics are better
Proceedings of the 2013 International Conference on Software Engineering
Better cross company defect prediction
Proceedings of the 10th Working Conference on Mining Software Repositories
Building a second opinion: learning cross-company data
Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Hi-index | 0.00 |
Context: Software defect prediction studies usually built models using within-company data, but very few focused on the prediction models trained with cross-company data. It is difficult to employ these models which are built on the within-company data in practice, because of the lack of these local data repositories. Recently, transfer learning has attracted more and more attention for building classifier in target domain using the data from related source domain. It is very useful in cases when distributions of training and test instances differ, but is it appropriate for cross-company software defect prediction? Objective: In this paper, we consider the cross-company defect prediction scenario where source and target data are drawn from different companies. In order to harness cross company data, we try to exploit the transfer learning method to build faster and highly effective prediction model. Method: Unlike the prior works selecting training data which are similar from the test data, we proposed a novel algorithm called Transfer Naive Bayes (TNB), by using the information of all the proper features in training data. Our solution estimates the distribution of the test data, and transfers cross-company data information into the weights of the training data. On these weighted data, the defect prediction model is built. Results: This article presents a theoretical analysis for the comparative methods, and shows the experiment results on the data sets from different organizations. It indicates that TNB is more accurate in terms of AUC (The area under the receiver operating characteristic curve), within less runtime than the state of the art methods. Conclusion: It is concluded that when there are too few local training data to train good classifiers, the useful knowledge from different-distribution training data on feature level may help. We are optimistic that our transfer learning method can guide optimal resource allocation strategies, which may reduce software testing cost and increase effectiveness of software testing process.