Reducing overfitting in genetic programming models for software quality classification

Authors:
Yi Liu;Taghi Khoshgoftaar
Affiliations:
Mathematics and Computer Science Department, Georgia College and State University, Milledgeville, GA;Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL
Venue:
HASE'04 Proceedings of the Eighth IEEE international conference on High assurance systems engineering
Year:
2004

Citing 17
Cited 12

Genetic programming (videotape): the movie

Genetic programming (videotape): the movie
Explicitly defined introns and destructive crossover in genetic programming

Advances in genetic programming
Data structures and genetic programming

Advances in genetic programming
Genetic programming: an introduction: on the automatic evolution of computer programs and its applications

Genetic programming: an introduction: on the automatic evolution of computer programs and its applications
Body of Knowledge for Software Quality Measurement

Computer
Emerald: Software Metrics and Models on the Desktop

IEEE Software
Assessing the applicability of fault-proneness models across object-oriented software projects

IEEE Transactions on Software Engineering
Genetic Programming Model for Software Quality Classification

HASE '01 The 6th IEEE International Symposium on High-Assurance Systems Engineering: Special Topic: Impact of Networking
Genetic Programming for Feature Discovery and Image Discrimination

Proceedings of the 5th International Conference on Genetic Algorithms
Complexity Compression and Evolution

Proceedings of the 6th International Conference on Genetic Algorithms
Fitness Causes Bloat: Mutation

EuroGP '98 Proceedings of the First European Workshop on Genetic Programming
Genetic Programming-Based Decision Trees for Software Quality Classification

ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
Application of multivariate analysis for software fault prediction

Software Quality Control
Muiltiobjective optimization using nondominated sorting in genetic algorithms

Evolutionary Computation
Effects of code growth and parsimony pressure on populations in genetic programming

Evolutionary Computation
Code growth in genetic programming

GECCO '96 Proceedings of the 1st annual conference on Genetic and evolutionary computation
An adverse interaction between crossover and restricted tree depth in genetic programming

GECCO '96 Proceedings of the 1st annual conference on Genetic and evolutionary computation

Use of genetic programming to diagnose venous thromboembolism in the emergency department

Genetic Programming and Evolvable Machines
Optimizing feature complementarity by evolution strategy: Application to automatic speaker verification

Speech Communication
Multi filter bank approach for speaker verification based on genetic algorithm

NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
A novel composite model approach to improve software quality prediction

Information and Software Technology
The relationship between search based software engineering and predictive modeling

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Reducing overfitting in manufacturing process modeling using a backward elimination based genetic programming

Applied Soft Computing
Review: On the application of genetic programming for software engineering predictive modeling: A systematic review

Expert Systems with Applications: An International Journal
Random sampling technique for overfitting control in genetic programming

EuroGP'12 Proceedings of the 15th European conference on Genetic Programming
Search-based software engineering: Trends, techniques and applications

ACM Computing Surveys (CSUR)
Where should we stop? an investigation on early stopping for GP learning

SEAL'12 Proceedings of the 9th international conference on Simulated Evolution and Learning
Balancing learning and overfitting in genetic programming with interleaved sampling of training data

EuroGP'13 Proceedings of the 16th European conference on Genetic Programming
Prediction of forest aboveground biomass: an exercise on avoiding overfitting

EvoApplications'13 Proceedings of the 16th European conference on Applications of Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

A high-assurance system is largely dependent on the quality of its underlying software. Software quality models can provide timely estimations of software quality, allowing the detection and correction of faults prior to operations. A software metrics-based quality prediction model may depict overfitting, which occurs when a prediction model has good accuracy on the training data but relatively poor accuracy on the test data. We present an approach to address the overfitting problem in the context of software quality classification models based on genetic programming (GP). The problem has not been addressed in depth for GP-based models. The presence of overfitting in a software quality classification model affects its practical usefulness, because management is interested in good performance of the model when applied to unseen software modules, i.e., generalization performance. In the process of building GP-based software quality classification models for a high-assurance telecommunications system, we observed that the GP models were prone to overfitting. We utilize a random sampling technique to reduce overfitting in our GP models. The approach has been found by many researchers as an effective method for reducing the time of a GP run. However, in our study we utilize random to reduce overfitting with the aim of improving the generalization capability of our GP models.