On software fault prediction by mining software complexity data with dynamically filtered training sets

Authors:
Vili Podgorelec
Affiliations:
Institute of Informatics, University of Maribor, Maribor, Slovenia
Venue:
SMO'09 Proceedings of the 9th WSEAS international conference on Simulation, modelling and optimization
Year:
2009

Citing 6
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Towards More Optimal Medical Diagnosing with Evolutionary Algorithms

Journal of Medical Systems
Decision Trees: An Overview and Their Use in Medicine

Journal of Medical Systems
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Improving Mining of Medical Data by Outliers Prediction

CBMS '05 Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software fault prediction methods are very appropriate for improving the software reliability. With the creation of large empirical databases of software projects, as a result of stimulated research on estimation models, metrics and methods for measuring and improving processes and products, intelligent mining of these datasets can largely add to the improvement of software reliability. In the paper we present a study on using decision tree classifiers for predicting software faults. A new training set filtering method is presented that should improve the classification performance when mining the software complexity measures data. The classification improvement should be achieved by removing the identified outliers from a training set. We argue that a classifier trained by a filtered dataset captures a more general knowledge model and should therefore perform better also on unseen cases. The proposed method is applied on a real-world software reliability analysis dataset and the obtained results are discussed.