Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Machine Learning
Software Measurement: Uncertainty and Causal Modeling
IEEE Software
Investigation of Logistic Regression as a Discriminant of Software Quality
METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
Prediction of Fault-proneness at Early Phase in Object-Oriented Development
ISORC '99 Proceedings of the 2nd IEEE International Symposium on Object-Oriented Real-Time Distributed Computing
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Robust Prediction of Fault-Proneness by Random Forests
ISSRE '04 Proceedings of the 15th International Symposium on Software Reliability Engineering
Predicting good probabilities with supervised learning
ICML '05 Proceedings of the 22nd international conference on Machine learning
Understanding the Yarowsky Algorithm
Computational Linguistics
Software quality estimation with limited fault data: a semi-supervised learning perspective
Software Quality Control
IEEE Transactions on Software Engineering
Comparing design and code metrics for software quality prediction
Proceedings of the 4th international workshop on Predictor models in software engineering
Theory of relative defect proneness
Empirical Software Engineering
Techniques for evaluating fault prediction models
Empirical Software Engineering
Performance Analysis of Datamining Algorithms for Software Quality Prediction
ARTCOM '09 Proceedings of the 2009 International Conference on Advances in Recent Technologies in Communication and Computing
Predicting software defect density: a case study on automated static code analysis
XP'07 Proceedings of the 8th international conference on Agile processes in software engineering and extreme programming
Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Hi-index | 0.00 |
Background: Many statistical and machine learning techniques have been implemented to build predictive fault models. Traditional methods are based on supervised learning. Software metrics for a module and corresponding fault information, available from previous projects, are used to train a fault prediction model. This approach calls for a large size of training data set and enables the development of effective fault prediction models. In practice, data collection costs, the lack of data from earlier projects or product versions may make large fault prediction training data set unattainable. Small size of the training set that may be available from the current project is known to deteriorate the performance of the fault predictive model. In semi-supervised learning approaches, software modules with known or unknown fault content can be used for training. Aims: To implement and evaluate a semi-supervised learning approach in software fault prediction. Methods: We investigate an iterative semi-supervised approach to software quality prediction in which a base supervised learner is used within a semi-supervised application. Results: We varied the size of labeled software modules from 2% to 50% of all the modules in the project. After tracking the performance of each iteration in the semi-supervised algorithm, we observe that semi-supervised learning improves fault prediction if the number of initially labeled software modules exceeds 5%. Conclusion: The semi-supervised approach outperforms the corresponding supervised learning approach when both use random forest as base classification algorithm.