Transfer learning for cross-company software defect prediction

Authors:
Ying Ma;Guangchun Luo;Xue Zeng;Aiguo Chen
Affiliations:
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China;School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China;School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China and Department of Computer Science, University of California, Los Angele ...;School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China
Venue:
Information and Software Technology
Year:
2012

Citing 27
Cited 5

Developing Interpretable Models with Optimized set Reduction for Identifying High-Risk Software Components

IEEE Transactions on Software Engineering - Special issue on software reliability
Information Retrieval

Information Retrieval
Gravity based spatial clustering

Proceedings of the 10th ACM international symposium on Advances in geographic information systems
Improving SVM accuracy by training on auxiliary data sources

ICML '04 Proceedings of the twenty-first international conference on Machine learning
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Predicting software defects in varying development lifecycles using Bayesian nets

Information and Software Technology
Object-oriented software fault prediction using neural networks

Information and Software Technology
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Discriminative learning for differing training and test distributions

Proceedings of the 24th international conference on Machine learning
Boosting for transfer learning

Proceedings of the 24th international conference on Machine learning
A Comparative Study of Methods for Transductive Transfer Learning

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Can chinese web pages be classified with english data source?

Proceedings of the 17th international conference on World Wide Web
Implications of ceiling effects in defect predictors

Proceedings of the 4th international workshop on Predictor models in software engineering
Data gravitation based classification

Information Sciences: an International Journal
Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem

Information Sciences: an International Journal
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering
Transferring naive bayes classifiers for text classification

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
TrBagg: A Simple Transfer Learning Method and its Application to Personalization in Collaborative Tagging

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Extending Semi-supervised Learning Methods for Inductive Transfer Learning

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Adaptive email spam filtering based on information theory

WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Evolutionary Optimization of Software Quality Modeling with Multiple Repositories

IEEE Transactions on Software Engineering
Locally weighted naive bayes

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Improving nearest neighbor classification with simulated gravitational collapse

ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part III
Application of neural networks to software quality modeling of a very large telecommunications system

IEEE Transactions on Neural Networks

Recalling the "imprecision" of cross-project defect prediction

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Transfer defect learning

Proceedings of the 2013 International Conference on Software Engineering
How, and why, process metrics are better

Proceedings of the 2013 International Conference on Software Engineering
Better cross company defect prediction

Proceedings of the 10th Working Conference on Mining Software Repositories
Building a second opinion: learning cross-company data

Proceedings of the 9th International Conference on Predictive Models in Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Context: Software defect prediction studies usually built models using within-company data, but very few focused on the prediction models trained with cross-company data. It is difficult to employ these models which are built on the within-company data in practice, because of the lack of these local data repositories. Recently, transfer learning has attracted more and more attention for building classifier in target domain using the data from related source domain. It is very useful in cases when distributions of training and test instances differ, but is it appropriate for cross-company software defect prediction? Objective: In this paper, we consider the cross-company defect prediction scenario where source and target data are drawn from different companies. In order to harness cross company data, we try to exploit the transfer learning method to build faster and highly effective prediction model. Method: Unlike the prior works selecting training data which are similar from the test data, we proposed a novel algorithm called Transfer Naive Bayes (TNB), by using the information of all the proper features in training data. Our solution estimates the distribution of the test data, and transfers cross-company data information into the weights of the training data. On these weighted data, the defect prediction model is built. Results: This article presents a theoretical analysis for the comparative methods, and shows the experiment results on the data sets from different organizations. It indicates that TNB is more accurate in terms of AUC (The area under the receiver operating characteristic curve), within less runtime than the state of the art methods. Conclusion: It is concluded that when there are too few local training data to train good classifiers, the useful knowledge from different-distribution training data on feature level may help. We are optimistic that our transfer learning method can guide optimal resource allocation strategies, which may reduce software testing cost and increase effectiveness of software testing process.