GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation

Authors:
Adriano L. I. Oliveira;Petronio L. Braga;Ricardo M. F. Lima;Márcio L. Cornélio
Affiliations:
Center of Informatics, Federal University of Pernambuco, Recife 50.732-970, Brazil;Center of Informatics, Federal University of Pernambuco, Recife 50.732-970, Brazil;Center of Informatics, Federal University of Pernambuco, Recife 50.732-970, Brazil;Center of Informatics, Federal University of Pernambuco, Recife 50.732-970, Brazil
Venue:
Information and Software Technology
Year:
2010

Citing 25
Cited 8

An empirical validation of software cost estimation models

Communications of the ACM
Support-Vector Networks

Machine Learning
Empirical Data Modeling in Software Engineering Using Radial Basis Functions

IEEE Transactions on Software Engineering
Estimating software projects

ACM SIGSOFT Software Engineering Notes
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Fundamentals of Artificial Neural Networks

Fundamentals of Artificial Neural Networks
Software Engineering Economics

Software Engineering Economics
A Simulation Study of the Model Evaluation Criterion MMRE

IEEE Transactions on Software Engineering
A tutorial on support vector regression

Statistics and Computing
Fundamentals of Computational Swarm Intelligence

Fundamentals of Computational Swarm Intelligence
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Selecting Best Practices for Effort Estimation

IEEE Transactions on Software Engineering
Cross versus Within-Company Cost Estimation Studies: A Systematic Review

IEEE Transactions on Software Engineering
Letters: Estimation of software project effort with support vector regression

Neurocomputing
Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation

IEEE Transactions on Software Engineering
Software Effort Estimation Using Machine Learning Techniques with Robust Confidence Intervals

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 01
An investigation of artificial neural networks based prediction systems in software project management

Journal of Systems and Software
Particle swarm optimization for parameter determination and feature selection of support vector machines

Expert Systems with Applications: An International Journal
Software development cost estimation using wavelet neural networks

Journal of Systems and Software
Comparative studies of the model evaluation criterions mmre and pred in software cost estimation research

Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
Reducing biases in individual software effort estimations: a combining approach

Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
Improved estimation of software project effort using multiple additive regression trees

Expert Systems with Applications: An International Journal
Why comparative effort prediction studies may be invalid

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
A study of the non-linear adjustment for analogy based software cost estimation

Empirical Software Engineering
Why software fails [software failure]

IEEE Spectrum

Software effort estimation based on optimized model tree

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
A hybrid system for dental milling parameters optimisation

HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part II
Systematic literature review of machine learning based software development effort estimation models

Information and Software Technology
Hybrid morphological methodology for software development cost estimation

Expert Systems with Applications: An International Journal
2012 Special Issue: An evolutionary morphological approach for software development cost estimation

Neural Networks
Quipu: A Statistical Model for Predicting Hardware Resources

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Applying soft computing techniques to optimise a dental milling process

Neurocomputing
A PSO-based model to increase the accuracy of software development effort estimation

Software Quality Control

Quantified Score

Hi-index	0.00

Visualization

Abstract

Context: In software industry, project managers usually rely on their previous experience to estimate the number men/hours required for each software project. The accuracy of such estimates is a key factor for the efficient application of human resources. Machine learning techniques such as radial basis function (RBF) neural networks, multi-layer perceptron (MLP) neural networks, support vector regression (SVR), bagging predictors and regression-based trees have recently been applied for estimating software development effort. Some works have demonstrated that the level of accuracy in software effort estimates strongly depends on the values of the parameters of these methods. In addition, it has been shown that the selection of the input features may also have an important influence on estimation accuracy. Objective: This paper proposes and investigates the use of a genetic algorithm method for simultaneously (1) select an optimal input feature subset and (2) optimize the parameters of machine learning methods, aiming at a higher accuracy level for the software effort estimates. Method: Simulations are carried out using six benchmark data sets of software projects, namely, Desharnais, NASA, COCOMO, Albrecht, Kemerer and Koten and Gray. The results are compared to those obtained by methods proposed in the literature using neural networks, support vector machines, multiple additive regression trees, bagging, and Bayesian statistical models. Results: In all data sets, the simulations have shown that the proposed GA-based method was able to improve the performance of the machine learning methods. The simulations have also demonstrated that the proposed method outperforms some recent methods reported in the recent literature for software effort estimation. Furthermore, the use of GA for feature selection considerably reduced the number of input features for five of the data sets used in our analysis. Conclusions: The combination of input features selection and parameters optimization of machine learning methods improves the accuracy of software development effort. In addition, this reduces model complexity, which may help understanding the relevance of each input feature. Therefore, some input parameters can be ignored without loss of accuracy in the estimations.