Software Effort Prediction Models Using Maximum Likelihood Methods Require Multivariate Normality of the Software Metrics Data Sample: Can Such a Sample Be Made Multivariate Normal?

Authors:
Victor K. Y. Chan
Affiliations:
Macau Polytechnic Institute
Venue:
COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Volume 01
Year:
2004

Citing 9
Cited 1

Statistical analysis with missing data

Statistical analysis with missing data
Statistical techniques for modelling software quality in the ESPIRIT REQUEST project

Software Engineering Journal
Data analysis for software metrics

Journal of Systems and Software - An Oregon workshop on software metrics
Criteria for software modularization

ICSE '85 Proceedings of the 8th international conference on Software engineering
Validating the ISO/IEC 15504 Measure of Software Requirements Analysis Process Capability

IEEE Transactions on Software Engineering
Software Cost Estimation with Incomplete Data

IEEE Transactions on Software Engineering
Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods

IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
SPSS Base 7.5 Syntax Reference Guide

SPSS Base 7.5 Syntax Reference Guide
Dealing with Missing Software Project Data

METRICS '03 Proceedings of the 9th International Symposium on Software Metrics

Outlier elimination in construction of software metric models

Proceedings of the 2007 ACM symposium on Applied computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Missing data often appear in software metrics data samples used to construct software effort prediction models1. So far, the least biased and thus the most strongly recommended family of such models capable of handling missing data are those using maximum likelihood methods. However, the theory of such maximum likelihood methods assumes that the data samples underlying the model construction are multivariate normal. Previous researches on such models simply ignored the violation of such an assumption by the empirical data samples. This paper proposes and empirically illustrates a not-so-complicated but effective technique to transform the data sample for the purpose of meeting such an assumption. This technique is empirically proven to work for typical software metrics data samples and the author recommends applying such a technique in any further researches on and practical industrial application of software effort prediction models using maximum likelihood methods.