A flexible method for software effort estimation by analogy

  • Authors:
  • Jingzhou Li;Guenther Ruhe;Ahmed Al-Emran;Michael M. Richter

  • Affiliations:
  • Software Engineering Decision Support Laboratory, University of Calgary, Calgary, Canada T2N1N4;Software Engineering Decision Support Laboratory, University of Calgary, Calgary, Canada T2N1N4;Software Engineering Decision Support Laboratory, University of Calgary, Calgary, Canada T2N1N4;TU Kaiserslautern, FB Informatik, Kaiserslautern, Germany 67653

  • Venue:
  • Empirical Software Engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

Effort estimation by analogy uses information from former similar projects to predict the effort for a new project. Existing analogy-based methods are limited by their inability to handle non-quantitative data and missing values. The accuracy of predictions needs improvement as well. In this paper, we propose a new flexible method called AQUA that is able to overcome the limitations of former methods. AQUA combines ideas from two known analogy-based estimation techniques: case-based reasoning and collaborative filtering. The method is applicable to predict effort related to any object at the requirement, feature, or project levels. Which are the main contributions of AQUA when compared to other methods? First, AQUA supports non-quantitative data by defining similarity measures for different data types. Second, it is able to tolerate missing values. Third, the results from an explorative study in this paper shows that the prediction accuracy is sensitive to both the number N of analogies (similar objects) taken for adaptation and the threshold T for the degree of similarity, which is true especially for larger data sets. A fixed and small number of analogies, as assumed in existing analogy-based methods, may not produce the best accuracy of prediction. Fourth, a flexible mechanism based on learning of existing data is proposed for determining the appropriate values of N and T likely to offer the best accuracy of prediction. New criteria to measure the quality of prediction are proposed. AQUA was validated against two internal and one public domain data sets with non-quantitative attributes and missing values. The obtained results are encouraging. In addition, acomparative analysis with existing analogy-based estimation methods was conducted using three publicly available data sets that were used by these methods. Intwo of the three cases, AQUA outperformed all other methods.