Software defect detection with rocus

Authors:
Yuan Jiang;Ming Li;Zhi-Hua Zhou
Affiliations:
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China;National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China;National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Venue:
Journal of Computer Science and Technology
Year:
2011

Citing 44
Cited 2

Bagging predictors

Machine Learning
A Validation of Object-Oriented Design Metrics as Quality Indicators

IEEE Transactions on Software Engineering
Solving the multiple instance problem with axis-parallel rectangles

Artificial Intelligence
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A Critique of Software Defect Prediction Models

IEEE Transactions on Software Engineering
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Elements of Software Science (Operating and programming systems series)

Elements of Software Science (Operating and programming systems series)
Ensembling neural networks: many could be better than all

Artificial Intelligence
Learning From Noisy Examples

Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Brief Introduction to Boosting

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques

Empirical Software Engineering
Tree-Based Software Quality Estimation Models For Fault Prediction

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Robust Prediction of Fault-Proneness by Random Forests

ISSRE '04 Proceedings of the 15th International Symposium on Software Reliability Engineering
Bootstrapping statistical parsers from small datasets

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction

IEEE Transactions on Software Engineering
Statistical debugging: simultaneous identification of multiple bugs

ICML '06 Proceedings of the 23rd international conference on Machine learning
Enhancing relevance feedback in image retrieval using unlabeled data

ACM Transactions on Information Systems (TOIS)
Improving fault prediction using Bayesian networks for the development of embedded software applications: Research Articles

Software Testing, Verification & Reliability - UKTest 2005: The Third U.K. Workshop on Software Testing Research
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

The Journal of Machine Learning Research
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults

IEEE Transactions on Software Engineering
Software quality estimation with limited fault data: a semi-supervised learning perspective

Software Quality Control
Semisupervised Regression with Cotraining-Style Algorithms

IEEE Transactions on Knowledge and Data Engineering
A Complexity Measure

IEEE Transactions on Software Engineering
Uncertainty Analysis in Software Reliability Modeling by Bayesian Analysis with Maximum-Entropy Principle

IEEE Transactions on Software Engineering
Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"

IEEE Transactions on Software Engineering
Statistical Debugging Using Latent Topic Models

ECML '07 Proceedings of the 18th European conference on Machine Learning
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
Semi-supervised document retrieval

Information Processing and Management: an International Journal
HOLMES: Effective statistical debugging via efficient path profiling

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Exploratory undersampling for class-imbalance learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Semi-Supervised Learning

Semi-Supervised Learning
Semi-supervised learning by disagreement

Knowledge and Information Systems
Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Sample-based software defect prediction with active and semi-supervised learning

Automated Software Engineering
Software defect prediction using relational association rule mining

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software defect detection aims to automatically identify defective software modules for efficient software test in order to improve the quality of a software system. Although many machine learning methods have been successfully applied to the task, most of them fail to consider two practical yet important issues in software defect detection. First, it is rather difficult to collect a large amount of labeled training data for learning a well-performing model; second, in a software system there are usually much fewer defective modules than defect-free modules, so learning would have to be conducted over an imbalanced data set. In this paper, we address these two practical issues simultaneously by prcposing a novel semi-supervised learning approach named ROCUS. This method exploits the abundant unlabeled examples to improve the detection accuracy, as well as employs under-sampling to tackle the class-imbalance problem in the learning process. Experimental results of real-world software defect detection tasks show that ROCUS is effective for software defect cetection. Its performance is better than a semi-supervised learning method that ignores the class-imbalance nature of the task and a class-imbalance learning method that does not make effective use of unlabeled data.