Machine Learning
A Validation of Object-Oriented Design Metrics as Quality Indicators
IEEE Transactions on Software Engineering
Solving the multiple instance problem with axis-parallel rectangles
Artificial Intelligence
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A Critique of Software Defect Prediction Models
IEEE Transactions on Software Engineering
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Elements of Software Science (Operating and programming systems series)
Elements of Software Science (Operating and programming systems series)
Ensembling neural networks: many could be better than all
Artificial Intelligence
Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Brief Introduction to Boosting
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques
Empirical Software Engineering
Tree-Based Software Quality Estimation Models For Fault Prediction
METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Robust Prediction of Fault-Proneness by Random Forests
ISSRE '04 Proceedings of the 15th International Symposium on Software Reliability Engineering
Bootstrapping statistical parsers from small datasets
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers
IEEE Transactions on Knowledge and Data Engineering
Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction
IEEE Transactions on Software Engineering
Statistical debugging: simultaneous identification of multiple bugs
ICML '06 Proceedings of the 23rd international conference on Machine learning
Enhancing relevance feedback in image retrieval using unlabeled data
ACM Transactions on Information Systems (TOIS)
Software Testing, Verification & Reliability - UKTest 2005: The Third U.K. Workshop on Software Testing Research
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples
The Journal of Machine Learning Research
Data Mining Static Code Attributes to Learn Defect Predictors
IEEE Transactions on Software Engineering
Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults
IEEE Transactions on Software Engineering
Software quality estimation with limited fault data: a semi-supervised learning perspective
Software Quality Control
Semisupervised Regression with Cotraining-Style Algorithms
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"
IEEE Transactions on Software Engineering
Statistical Debugging Using Latent Topic Models
ECML '07 Proceedings of the 18th European conference on Machine Learning
IEEE Transactions on Software Engineering
Semi-supervised document retrieval
Information Processing and Management: an International Journal
HOLMES: Effective statistical debugging via efficient path profiling
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Exploratory undersampling for class-imbalance learning
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Semi-Supervised Learning
Semi-supervised learning by disagreement
Knowledge and Information Systems
Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Sample-based software defect prediction with active and semi-supervised learning
Automated Software Engineering
Software defect prediction using relational association rule mining
Information Sciences: an International Journal
Hi-index | 0.00 |
Software defect detection aims to automatically identify defective software modules for efficient software test in order to improve the quality of a software system. Although many machine learning methods have been successfully applied to the task, most of them fail to consider two practical yet important issues in software defect detection. First, it is rather difficult to collect a large amount of labeled training data for learning a well-performing model; second, in a software system there are usually much fewer defective modules than defect-free modules, so learning would have to be conducted over an imbalanced data set. In this paper, we address these two practical issues simultaneously by prcposing a novel semi-supervised learning approach named ROCUS. This method exploits the abundant unlabeled examples to improve the detection accuracy, as well as employs under-sampling to tackle the class-imbalance problem in the learning process. Experimental results of real-world software defect detection tasks show that ROCUS is effective for software defect cetection. Its performance is better than a semi-supervised learning method that ignores the class-imbalance nature of the task and a class-imbalance learning method that does not make effective use of unlabeled data.