Sample-based software defect prediction with active and semi-supervised learning

Authors:
Ming Li;Hongyu Zhang;Rongxin Wu;Zhi-Hua Zhou
Affiliations:
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 210093;MOE Key Laboratory for Information System Security, Tsinghua University, Beijing, China 100084;MOE Key Laboratory for Information System Security, Tsinghua University, Beijing, China 100084;National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 210093
Venue:
Automated Software Engineering
Year:
2012

Citing 38
Cited 3

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Random Forests

Machine Learning
Learning From Noisy Examples

Machine Learning
Active + Semi-supervised Learning = Robust Multi-View Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Support Vector Machine Active Learning with Application sto Text Classification

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Bootstrapping statistical parsers from small datasets

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
Building Defect Prediction Models in Practice

IEEE Software
Mining metrics to predict component failures

Proceedings of the 28th international conference on Software engineering
Enhancing relevance feedback in image retrieval using unlabeled data

ACM Transactions on Information Systems (TOIS)
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

The Journal of Machine Learning Research
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Predicting Faults from Cached History

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Predicting Defects for Eclipse

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Semisupervised Regression with Cotraining-Style Algorithms

IEEE Transactions on Knowledge and Data Engineering
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"

IEEE Transactions on Software Engineering
Predicting Defective Software Components from Code Complexity Measures

PRDC '07 Proceedings of the 13th Pacific Rim International Symposium on Dependable Computing
Predicting defects using network analysis on dependency graphs

Proceedings of the 30th international conference on Software engineering
On multi-view active learning and the combination with semi-supervised learning

Proceedings of the 25th international conference on Machine learning
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
Semi-supervised document retrieval

Information Processing and Management: an International Journal
Predicting faults using the complexity of code changes

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
When Semi-supervised Learning Meets Ensemble Learning

MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering
Margin based active learning

COLT'07 Proceedings of the 20th annual conference on Learning theory
Semi-Supervised Learning

Semi-Supervised Learning
Semi-supervised learning by disagreement

Knowledge and Information Systems
On the value of learning from defect dense components for software defect prediction

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Sampling program quality

ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
Software defect detection with rocus

Journal of Computer Science and Technology
Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Software defect prediction using semi-supervised learning with dimension reduction

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
A cost-effectiveness criterion for applying software defect prediction models

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Software defect prediction using relational association rule mining

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software defect prediction can help us better understand and control software quality. Current defect prediction techniques are mainly based on a sufficient amount of historical project data. However, historical data is often not available for new projects and for many organizations. In this case, effective defect prediction is difficult to achieve. To address this problem, we propose sample-based methods for software defect prediction. For a large software system, we can select and test a small percentage of modules, and then build a defect prediction model to predict defect-proneness of the rest of the modules. In this paper, we describe three methods for selecting a sample: random sampling with conventional machine learners, random sampling with a semi-supervised learner and active sampling with active semi-supervised learner. To facilitate the active sampling, we propose a novel active semi-supervised learning method ACoForest which is able to sample the modules that are most helpful for learning a good prediction model. Our experiments on PROMISE datasets show that the proposed methods are effective and have potential to be applied to industrial practice.