Impact Analysis of Missing Values on the Prediction Accuracy of Analogy-based Software Effort Estimation Method AQUA

Authors:
Jingzhou Li;Ahmed Al-Emran;Guenther Ruhe
Affiliations:
University of Calgary, Canada;University of Calgary, Canada;University of Calgary, Canada
Venue:
ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Year:
2007

Citing 0
Cited 2

Systematic literature review of machine learning based software development effort estimation models

Information and Software Technology
Maximising data retention from the ISBSG repository

EASE'08 Proceedings of the 12th international conference on Evaluation and Assessment in Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Effort estimation by analogy (EBA) is often confronted with missing values. Our former analogy-based method AUQA is able to tolerate missing values in the data set, but it is unclear how the percentage of missing values impacts the prediction accuracy and if there is an upper bound for how big this percentage might become in order to guarantee the applicability of AQUA. This paper investigates these questions through an impact analysis. The impact analysis is conducted for seven data sets being of different size and having different initial percentages of missing values. The major results are that (i) we confirm the intuition that the more missing values, the poorer the prediction accuracy of AQUA; (ii) there is a quadratic dependency between the prediction accuracy and the percentage of missing values; and (iii) the upper limit of missing values for the applicability of AQUA is determined as 40%. These results are obtained in the context of AQUA. Further analysis is necessary for other ways of applying EBA, such as using different similarity measures or analogy adaptation methods from those used in AQUA. For that purpose, the experimental design in this study can be adapted.