Statistical analysis with missing data
Statistical analysis with missing data
Handbook of software reliability engineering
Handbook of software reliability engineering
Software metrics (2nd ed.): a rigorous and practical approach
Software metrics (2nd ed.): a rigorous and practical approach
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Learning to classify text from labeled and unlabeled documents
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Globally Optimal Fuzzy Decision Trees for Classification and Regression
IEEE Transactions on Pattern Analysis and Machine Intelligence
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
Comparing case-based reasoning classifiers for predicting high risk software components
Journal of Systems and Software
Empirical Software Engineering
Balancing Misclassification Rates in Classification-TreeModels of Software Quality
Empirical Software Engineering
Enhancing Supervised Learning with Unlabeled Data
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Investigation of Logistic Regression as a Discriminant of Software Quality
METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
Tree-Based Software Quality Estimation Models For Fault Prediction
METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Experience from Replicating Empirical Studies on Prediction Models
METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Analogy-Based Practical Classification Rules for Software Quality Estimation
Empirical Software Engineering
Genetic Programming-Based Decision Trees for Software Quality Classification
ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
Information Sciences: an International Journal
Amulti-instance model for software quality estimation in OO systems
ICNC'09 Proceedings of the 5th international conference on Natural computation
Review: Software fault prediction: A literature review and current trends
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
Software defect detection with rocus
Journal of Computer Science and Technology
Handling missing data in software effort prediction with naive Bayes and EM algorithm
Proceedings of the 7th International Conference on Predictive Models in Software Engineering
An iterative semi-supervised approach to software fault prediction
Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Software defect prediction using semi-supervised learning with dimension reduction
Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Creating Process-Agents incrementally by mining process asset library
Information Sciences: an International Journal
An in-depth study of the potentially confounding effect of class size in fault prediction
ACM Transactions on Software Engineering and Methodology (TOSEM)
Hi-index | 0.00 |
We addresses the important problem of software quality analysis when there is limited software fault or fault-proneness data. A software quality model is typically trained using software measurement and fault data obtained from a previous release or similar project. Such an approach assumes that fault data is available for all the training modules. Various issues in software development may limit the availability of fault-proneness data for all the training modules. Consequently, the available labeled training dataset is such that the trained software quality model may not provide predictions. More specifically, the small set of modules with known fault-proneness labels is not sufficient for capturing the software quality trends of the project. We investigate semi-supervised learning with the Expectation Maximization (EM) algorithm for software quality estimation with limited fault-proneness data. The hypothesis is that knowledge stored in software attributes of the unlabeled program modules will aid in improving software quality estimation. Software data collected from a large NASA software project is used during the semi-supervised learning process. The software quality model is evaluated with multiple test datasets collected from other NASA software projects. Compared to software quality models trained only with the available set of labeled program modules, the EM-based semi-supervised learning scheme improves generalization performance of the software quality models.