An analysis of multi-objective evolutionary algorithms for training ensemble models based on different performance measures in software effort estimation

  • Authors:
  • Leandro L. Minku;Xin Yao

  • Affiliations:
  • The University of Birmingham, Edgbaston, Birmingham, UK;The University of Birmingham, Edgbaston, Birmingham, UK

  • Venue:
  • Proceedings of the 9th International Conference on Predictive Models in Software Engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Background: Previous work showed that Multi-objective Evolutionary Algorithms (MOEAs) can be used for training ensembles of learning machines for Software Effort Estimation (SEE) by optimising different performance measures concurrently. Optimisation based on three measures (LSD, MMRE and PRED(25)) was analysed and led to promising results in terms of performance on these and other measures. Aims: (a) It is not known how well ensembles trained on other measures would behave for SEE, and whether training on certain measures would improve performance particularly on these measures. (b) It is also not known whether it is best to include all SEE models created by the MOEA into the ensemble, or solely the models with the best training performance in terms of each measure being optimised. Investigating (a) and (b) is the aim of this work. Method: MOEAs were used to train ensembles by optimising four different sets of performance measures, involving a total of nine different measures. The performance of all ensembles was then compared based on all these nine performance measures. Ensembles composed of different sets of models generated by the MOEAs were also compared. Results: (a) Ensembles trained on LSD, MMRE and PRED (25) obtained the best results in terms of most performance measures, being considered more successful than the others. Optimising certain performance measures did not necessarily lead to the best test performance on these particular measures probably due to overfitting. (b) There was no inherent advantage in using ensembles composed of all the SEE models generated by the MOEA in comparison to using solely the best SEE model according to each measure separately. Conclusions: Care must be taken to prevent overfitting on the performance measures being optimised. Our results suggest that concurrently optimising LSD, MMRE and PRED (25) promoted more ensemble diversity than other combinations of measures, and hence performed best. Low diversity is more likely to lead to overfitting.