A Fault Prediction Model with Limited Fault Data to Improve Test Process

Authors:
Cagatay Catal;Banu Diri
Affiliations:
The Scientific and Technological Research Council of TURKEY, Marmara Research Center, Information Technologies Institute, , Kocaeli, Turkey;Department of Computer Engineering, Yildiz Technical University, Istanbul, Turkey
Venue:
PROFES '08 Proceedings of the 9th international conference on Product-Focused Software Process Improvement
Year:
2008

Citing 29
Cited 2

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Semi-supervised support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
Probabilistic modeling for face orientation discrimination: learning from labeled and unlabeled data

Proceedings of the 1998 conference on Advances in neural information processing systems II
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Comparing case-based reasoning classifiers for predicting high risk software components

Journal of Systems and Software
A Metrics Suite for Object Oriented Design

IEEE Transactions on Software Engineering
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Unlabeled Data Can Degrade Classification Performance of Generative Classifiers

Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference
Learning from Labeled and Unlabeled Data using Graph Mincuts

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
An Application of Fuzzy Clustering to Software Quality Prediction

ASSET '00 Proceedings of the 3rd IEEE Symposium on Application-Specific Systems and Software Engineering Technology (ASSET'00)
Tree-Based Software Quality Estimation Models For Fault Prediction

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Software Quality Classification Modeling Using The SPRINT Decision Tree Algorithm

ICTAI '02 Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
Application of Neural Networks for Software Quality Prediction Using Object-Oriented Metrics

ICSM '03 Proceedings of the International Conference on Software Maintenance
Does Baum-Welch re-estimation help taggers?

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Semi-Supervised Learning for Software Quality Estimation

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Robust Prediction of Fault-Proneness by Random Forests

ISSRE '04 Proceedings of the 15th International Symposium on Software Reliability Engineering
Software Quality Engineering: Testing, Quality Assurance, and Quantifiable Improvement

Software Quality Engineering: Testing, Quality Assurance, and Quantifiable Improvement
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Software defect prediction using artificial immune recognition system

SE'07 Proceedings of the 25th conference on IASTED International Multi-Conference: Software Engineering
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
AUC: a better measure than accuracy in comparing learning algorithms

AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence
Semi-Supervised Learning

Semi-Supervised Learning
Unsupervised learning for expert-based software quality estimation

HASE'04 Proceedings of the Eighth IEEE international conference on High assurance systems engineering
Using weighted nearest neighbor to benefit from unlabeled data

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Software fault prediction with object-oriented metrics based artificial immune recognition system

PROFES'07 Proceedings of the 8th international conference on Product-Focused Software Process Improvement

Review: A systematic review of software fault prediction studies

Expert Systems with Applications: An International Journal
Practical development of an Eclipse-based software fault prediction tool using Naive Bayes algorithm

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software fault prediction models are used to identify the fault-prone software modules and produce reliable software. Performance of a software fault prediction model is correlated with available software metrics and fault data. In some occasions, there may be few software modules having fault data and therefore, prediction models using only labeled data can not provide accurate results. Semi-supervised learning approaches which benefit from unlabeled and labeled data may be applied in this case. In this paper, we propose an artificial immune system based semi-supervised learning approach. Proposed approach uses a recent semi-supervised algorithm called YATSI (Yet Another Two Stage Idea) and in the first stage of YATSI, AIRS (Artificial Immune Recognition Systems) is applied. In addition, AIRS, RF (Random Forests) classifier, AIRS based YATSI, and RF based YATSI are benchmarked. Experimental results showed that while YATSI algorithm improved the performance of AIRS, it diminished the performance of RF for unbalanced datasets. Furthermore, performance of AIRS based YATSI is comparable with RF which is the best machine learning classifier according to some researches.