Robust Prediction of Fault-Proneness by Random Forests

Authors:
Lan Guo;Yan Ma;Bojan Cukic;Harshinder Singh
Affiliations:
West Virginia University, Morgantown, WV;West Virginia University, Morgantown, WV;West Virginia University, Morgantown, WV;West Virginia University, Morgantown, WV
Venue:
ISSRE '04 Proceedings of the 15th International Symposium on Software Reliability Engineering
Year:
2004

Citing 0
Cited 24

Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
Building Defect Prediction Models in Practice

IEEE Software
Looking for bugs in all the right places

Proceedings of the 2006 international symposium on Software testing and analysis
Adequate and Precise Evaluation of Quality Models in Software Engineering Studies

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Using Developer Information as a Factor for Fault Prediction

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Automating algorithms for the identification of fault-prone files

Proceedings of the 2007 international symposium on Software testing and analysis
Software engineering research: from cradle to grave

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Predicting defect-prone software modules using support vector machines

Journal of Systems and Software
Comparing negative binomial and recursive partitioning models for fault prediction

Proceedings of the 4th international workshop on Predictor models in software engineering
Comparing design and code metrics for software quality prediction

Proceedings of the 4th international workshop on Predictor models in software engineering
Techniques for evaluating fault prediction models

Empirical Software Engineering
Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

Empirical Software Engineering
A Fault Prediction Model with Limited Fault Data to Improve Test Process

PROFES '08 Proceedings of the 9th international conference on Product-Focused Software Process Improvement
Misclassification cost-sensitive fault prediction models

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
A systematic and comprehensive investigation of methods to build and evaluate fault prediction models

Journal of Systems and Software
A symbolic fault-prediction model based on multiobjective particle swarm optimization

Journal of Systems and Software
Comparing the effectiveness of several modeling methods for fault prediction

Empirical Software Engineering
What can fault prediction do for you?

TAP'08 Proceedings of the 2nd international conference on Tests and proofs
Variance analysis in software fault prediction models

ISSRE'09 Proceedings of the 20th IEEE international conference on software reliability engineering
Do time of day and developer experience affect commit bugginess?

Proceedings of the 8th Working Conference on Mining Software Repositories
Software defect detection with rocus

Journal of Computer Science and Technology
An iterative semi-supervised approach to software fault prediction

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Investigating fault prediction capabilities of five prediction models for software quality

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Software defect prediction using relational association rule mining

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Accurate prediction of fault prone modules (a module is equivalent to a C function or a C++ method) in software development process enables effective detection and identification of defects. Such prediction models are especially beneficial for large-scale systems, where verification experts need to focus their attention and resources to problem areas in the system under development. This paper presents a novel methodology for predicting fault prone modules, based on random forests. Random forests are an extension of decision tree learning. Instead of generating one decision tree, this methodology generates hundreds or even thousands of trees using subsets of the training data. Classification decision is obtained by voting. We applied random forests in five case studies based on NASA data sets. The prediction accuracy of the proposed methodology is generally higher than that achieved by logistic regression, discriminant analysis and the algorithms in two machine learning software packages, WEKA [Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations] and See5 [http://www.rulequest.com/see5-info.html]. The difference in the performance of the proposed methodology over other methods is statistically significant. Further, the classification accuracy of random forests is more significant over other methods in larger data sets.