Investigating fault prediction capabilities of five prediction models for software quality

Authors:
Deepak Banthia;Atul Gupta
Affiliations:
Indian Institute of Information Technology Design and Manufacturing (IIITDM) Jabalpur, India;Indian Institute of Information Technology Design and Manufacturing (IIITDM) Jabalpur, India
Venue:
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Year:
2012

Citing 5
Cited 0

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Tree-Based Software Quality Estimation Models For Fault Prediction

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Application of Neural Networks for Software Quality Prediction Using Object-Oriented Metrics

ICSM '03 Proceedings of the International Conference on Software Maintenance
Robust Prediction of Fault-Proneness by Random Forests

ISSRE '04 Proceedings of the 15th International Symposium on Software Reliability Engineering
Software Defect Prediction Using Regression via Classification

AICCSA '06 Proceedings of the IEEE International Conference on Computer Systems and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Predicting faults in software modules can lead to a high quality and more effective software development process to follow. However, the results of a fault prediction model have to be properly interpreted before incorporating them into any decision making. Most of the earlier studies have used the prediction accuracy as the main criteria to compare amongst competing fault prediction models. However, we show that besides accuracy, other criteria like number of false positives and false negatives can equally be important to choose a candidate model for fault prediction. We have used five NASA software data sets in our experiment. Our results suggest that the performance of Simple Logistic is better than the others on raw data sets whereas the performance of Neural Network was found to be better when we applied dimensionality reduction method on raw data sets. When we used data pre-processing techniques, the prediction accuracy of Random Forest was found to be better in both cases i.e. with and without dimensionality reduction but reliability of Simple Logistic was better than Random Forest because it had less number of fault negatives.