IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Applying statistical methodology to optimize and simplify software metric models with missing data
Proceedings of the 2006 ACM symposium on Applied computing
Outlier elimination in construction of software metric models
Proceedings of the 2007 ACM symposium on Applied computing
Hi-index | 0.00 |
Incomplete, or missing, data is likely to be encountered in empirical software engineering data sets. In this paper we evaluate some methods for handling missing data. The methods are presented and discussed in general and thereafter applied to effort estimation of ERP projects. We found that two sampling-based methods, mean imputation (MI) and similar response pattern imputation (SRPI), waste less information than listwise deletion (LD). However, MI may introduce more bias than the SRPI method. Compared to sampling-based methods, likelihood-based imputation methods require too large data sets to be realistic to use in empirical software engineering. None of the sampling-based methods, such as MI and SRPI, seem able to correct bias. So, though imputation is an attractive idea, the available methods still have severe limitations.