C4.5: programs for machine learning
C4.5: programs for machine learning
Robust Classification for Imprecise Environments
Machine Learning
Efficient and Accurate Parallel Genetic Algorithms
Efficient and Accurate Parallel Genetic Algorithms
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence
Data Mining and Knowledge Discovery with Evolutionary Algorithms
Data Mining and Knowledge Discovery with Evolutionary Algorithms
Expected Allele Coverage and the Role of Mutation in Genetic Algorithms
Proceedings of the 5th International Conference on Genetic Algorithms
Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques
Empirical Software Engineering
AI Application Programming
Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study
Empirical Software Engineering
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Data Mining Static Code Attributes to Learn Defect Predictors
IEEE Transactions on Software Engineering
Experimental perspectives on learning from imbalanced data
Proceedings of the 24th international conference on Machine learning
The Chimera of Software Quality
Computer
The Software Engineering Silver Bullet Conundrum
IEEE Software
Software Metrics: Progress after 25 Years?
IEEE Software
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
IEEE Transactions on Neural Networks
Classification and outlier detection based on topic based pattern synthesis
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Hi-index | 0.00 |
Software quality modeling for high-assurance systems, such as safety-critical systems, is adversely affected by the skewed distribution of fault-prone program modules. This sparsity of defect occurrence within the software system impedes training and performance of software quality estimation models. Data sampling approaches presented in data mining and machine learning literature can be used to address the imbalance problem. We present a novel genetic algorithm-based data sampling method, named Evolutionary Sampling, as a solution to improving software quality modeling for high-assurance systems. The proposed solution is compared with multiple existing data sampling techniques, including random undersampling, one-sided selection, Wilson's editing, random oversampling, cluster-based oversampling, Synthetic Minority Oversampling Technique (SMOTE), and Borderline-SMOTE. This paper involves case studies of two real-world software systems and builds C4.5- and RIPPER-based software quality models both before and after applying a given data sampling technique. It is empirically shown that Evolutionary Sampling improves performance of software quality models for high-assurance systems and is significantly better than most existing data sampling techniques.