Statistical analysis with missing data
Statistical analysis with missing data
Efficient ML estimation of the multivariate normal distribution from incomplete data
Journal of Multivariate Analysis
Grouped Dirichlet distribution: A new tool for incomplete categorical data analysis
Journal of Multivariate Analysis
Further properties and new applications of the nested Dirichlet distribution
Computational Statistics & Data Analysis
Hi-index | 0.03 |
Constructing confidence interval (CI) for functions of cell probabilities (e.g., rate difference, rate ratio and odds ratio) is a standard procedure for categorical data analysis in clinical trials and medical studies. In the presence of incomplete data, existing methods could be problematic. For example, the inverse of the observed information matrix may not exist and the asymptotic CIs based on delta methods are hence not available. Even though the inverse of the observed information matrix exists, the large-sample delta methods are generally not reliable in small-sample studies. In addition, existing expectation-maximization (EM) algorithm via the conventional data augmentation (DA) may suffer from slow convergence due to the introduction of too many latent variables. In this article, for rxc tables with incomplete data, we propose a novel DA scheme that requires fewer latent variables and this will consequently lead to a more efficient EM algorithm. We present two bootstrap-type CIs for parameters of interest via the new EM algorithm with and without the normality assumption. For rxc tables with only one incomplete/supplementary margin, the improved EM algorithm converges in only one step and the associated maximum likelihood estimates can hence be obtained in closed form. Theoretical and simulation results showed that the proposed EM algorithm outperforms the existing EM algorithm. Three real data from a neurological study, a rheumatoid arthritis study and a wheeze study are used to illustrate the methodologies.