Consistency, bias and efficiency of the normal-distribution-based MLE: The role of auxiliary variables

  • Authors:
  • Ke-Hai Yuan;Victoria Savalei

  • Affiliations:
  • -;-

  • Venue:
  • Journal of Multivariate Analysis
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Normal-distribution-based maximum likelihood (NML) is most widely used for missing data analysis although real data seldom follow a normal distribution. When missing values are missing at random (MAR), recent results indicate that NML estimates (NMLEs) are still consistent for nonnormally distributed populations as long as the variables are linearly related. However, NMLEs are generally not consistent when the variables are nonlinearly related in the population. Similarly, NMLEs are generally not consistent when data are missing not at random (MNAR). It is well-known that including proper auxiliary variables mitigates the bias in MLEs caused by MNAR mechanism. With nonlinear relationships underlying the manifest variables and under MAR mechanism, the article contains a theoretical result showing that NMLEs are still consistent when proper nonlinear functions of the observed variables are included as auxiliary variables. Empirical results indicate that including auxiliary variables reduces bias in the estimates, but may also increase their standard errors substantially when sample size is small and the proportion of missing data is not trivial. Empirical results also imply that bias in NMLEs due to a nonnormally distributed population and MAR mechanism can be considerably greater when compared to bias caused by MNAR mechanism with a normally distributed population. How to select auxiliary variables in practice is also discussed.