How effective is Tabu search to configure support vector regression for effort estimation?

  • Authors:
  • A. Corazza;S. Di Martino;F. Ferrucci;C. Gravino;F. Sarro;E. Mendes

  • Affiliations:
  • University of Napoli "Federico II", Napoli, Italy;University of Napoli "Federico II", Napoli, Italy;University of Salerno, Fisciano (SA), Italy;University of Salerno, Fisciano (SA), Italy;University of Salerno, Fisciano (SA), Italy;University of Auckland, Auckland, New Zeland

  • Venue:
  • Proceedings of the 6th International Conference on Predictive Models in Software Engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Background. Recent studies have shown that Support Vector Regression (SVR) has an interesting potential in the field of effort estimation. However applying SVR requires to carefully set some parameters that heavily affect the prediction accuracy. No general guidelines are available to select these parameters, whose choice also depends on the characteristics of the data set used. This motivates the work described in this paper. Aims. We have investigated the use of an optimization technique in combination with SVR to select a suitable subset of parameters to be used for effort estimation. This technique is named Tabu Search (TS), which is a meta-heuristic approach used to address several optimization problems. Method. We employed SVR with linear and RBF kernels, and used variables' preprocessing strategies (i.e., logarithmic). As for the data set, we employed the Tukutuku cross-company database, which is widely adopted in Web effort estimation studies, and performed a hold-out validation using two different splits of the data set. As benchmark, results are compared to those obtained with Manual StepWise Regression, Case-Based Reasoning, and Bayesian Networks. Results. Our results show that TS provides a good choice of parameters, so that the combination of TS and SVR outperforms any other technique applied on this data set. Conclusions. The use of the meta-heuristic Tabu Search allowed us to obtain (I) an automatic choice of the parameters required to run SVR, and (II) a significant improvement on prediction accuracy for SVR. While we are not guaranteed that this is the global optimum, the results we are presenting are the best performance ever obtained on the problem at the hand, up to now. Of course, the experimental results here presented should be assessed on further data. However, they are surely interesting enough to suggest the use of SVR among the techniques that are suitable for effort estimation, especially when using a cross-company database.