Evolutionary sampling and software quality modeling of high-assurance systems

Authors:
Dennis J. Drown;Taghi M. Khoshgoftaar;Naeem Seliya
Affiliations:
Florida Atlantic University, Boca Raton, FL;Florida Atlantic University, Boca Raton, FL;University of Michigan-Dearborn, Dearborn, MI
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Year:
2009

Citing 21
Cited 3

C4.5: programs for machine learning

C4.5: programs for machine learning
Robust Classification for Imprecise Environments

Machine Learning
Efficient and Accurate Parallel Genetic Algorithms

Efficient and Accurate Parallel Genetic Algorithms
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence

Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence
Data Mining and Knowledge Discovery with Evolutionary Algorithms

Data Mining and Knowledge Discovery with Evolutionary Algorithms
Expected Allele Coverage and the Role of Mutation in Genetic Algorithms

Proceedings of the 5th International Conference on Genetic Algorithms
Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques

Empirical Software Engineering
AI Application Programming

AI Application Programming
Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study

Empirical Software Engineering
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Experimental perspectives on learning from imbalanced data

Proceedings of the 24th international conference on Machine learning
The Chimera of Software Quality

Computer
The Software Engineering Silver Bullet Conundrum

IEEE Software
Software Metrics: Progress after 25 Years?

IEEE Software
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Supervised neural network modeling: an empirical investigation into learning from imbalanced data with labeling errors

IEEE Transactions on Neural Networks
EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling

Pattern Recognition
Classification and outlier detection based on topic based pattern synthesis

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software quality modeling for high-assurance systems, such as safety-critical systems, is adversely affected by the skewed distribution of fault-prone program modules. This sparsity of defect occurrence within the software system impedes training and performance of software quality estimation models. Data sampling approaches presented in data mining and machine learning literature can be used to address the imbalance problem. We present a novel genetic algorithm-based data sampling method, named Evolutionary Sampling, as a solution to improving software quality modeling for high-assurance systems. The proposed solution is compared with multiple existing data sampling techniques, including random undersampling, one-sided selection, Wilson's editing, random oversampling, cluster-based oversampling, Synthetic Minority Oversampling Technique (SMOTE), and Borderline-SMOTE. This paper involves case studies of two real-world software systems and builds C4.5- and RIPPER-based software quality models both before and after applying a given data sampling technique. It is empirically shown that Evolutionary Sampling improves performance of software quality models for high-assurance systems and is significantly better than most existing data sampling techniques.