Software Effort Prediction Models Using Maximum Likelihood Methods Require Multivariate Normality of the Software Metrics Data Sample: Can Such a Sample Be Made Multivariate Normal?

  • Authors:
  • Victor K. Y. Chan

  • Affiliations:
  • Macau Polytechnic Institute

  • Venue:
  • COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Volume 01
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Missing data often appear in software metrics data samples used to construct software effort prediction models1. So far, the least biased and thus the most strongly recommended family of such models capable of handling missing data are those using maximum likelihood methods. However, the theory of such maximum likelihood methods assumes that the data samples underlying the model construction are multivariate normal. Previous researches on such models simply ignored the violation of such an assumption by the empirical data samples. This paper proposes and empirically illustrates a not-so-complicated but effective technique to transform the data sample for the purpose of meeting such an assumption. This technique is empirically proven to work for typical software metrics data samples and the author recommends applying such a technique in any further researches on and practical industrial application of software effort prediction models using maximum likelihood methods.