Comparing cost prediction models by resampling techniques

  • Authors:
  • Nikolaos Mittas;Lefteris Angelis

  • Affiliations:
  • Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece;Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

  • Venue:
  • Journal of Systems and Software
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The accurate software cost prediction is a research topic that has attracted much of the interest of the software engineering community during the latest decades. A large part of the research efforts involves the development of statistical models based on historical data. Since there are a lot of models that can be fitted to certain data, a crucial issue is the selection of the most efficient prediction model. Most often this selection is based on comparisons of various accuracy measures that are functions of the model's relative errors. However, the usual practice is to consider as the most accurate prediction model the one providing the best accuracy measure without testing if this superiority is in fact statistically significant. This policy can lead to unstable and erroneous conclusions since a small change in the data is able to turn over the best model selection. On the other hand, the accuracy measures used in practice are statistics with unknown probability distributions, making the testing of any hypothesis, by the traditional parametric methods, problematic. In this paper, the use of statistical simulation tools is proposed in order to test the significance of the difference between the accuracy of two prediction methods: regression and estimation by analogy. The statistical simulation procedures involve permutation tests and bootstrap techniques for the construction of confidence intervals for the difference of measures. Four known datasets are used for experimentation in order to validate the results and make comparisons between the simulation methods and the traditional parametric and non-parametric procedures.