On improved EM algorithm and confidence interval construction for incomplete rXc tables

Authors:
Man-Lai Tang;Kai Wang Ng;Guo-Liang Tian;Ming Tan
Affiliations:
Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong, PR China;Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong, PR China;Division of Biostatistics, University of Maryland Greenebaum Cancer Center, 22 South Greene Street, Baltimore, MD 21201, USA;Division of Biostatistics, University of Maryland Greenebaum Cancer Center, 22 South Greene Street, Baltimore, MD 21201, USA
Venue:
Computational Statistics & Data Analysis
Year:
2007

Citing 2
Cited 2

Statistical analysis with missing data

Statistical analysis with missing data
Efficient ML estimation of the multivariate normal distribution from incomplete data

Journal of Multivariate Analysis

Grouped Dirichlet distribution: A new tool for incomplete categorical data analysis

Journal of Multivariate Analysis
Further properties and new applications of the nested Dirichlet distribution

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.03

Visualization

Abstract

Constructing confidence interval (CI) for functions of cell probabilities (e.g., rate difference, rate ratio and odds ratio) is a standard procedure for categorical data analysis in clinical trials and medical studies. In the presence of incomplete data, existing methods could be problematic. For example, the inverse of the observed information matrix may not exist and the asymptotic CIs based on delta methods are hence not available. Even though the inverse of the observed information matrix exists, the large-sample delta methods are generally not reliable in small-sample studies. In addition, existing expectation-maximization (EM) algorithm via the conventional data augmentation (DA) may suffer from slow convergence due to the introduction of too many latent variables. In this article, for rxc tables with incomplete data, we propose a novel DA scheme that requires fewer latent variables and this will consequently lead to a more efficient EM algorithm. We present two bootstrap-type CIs for parameters of interest via the new EM algorithm with and without the normality assumption. For rxc tables with only one incomplete/supplementary margin, the improved EM algorithm converges in only one step and the associated maximum likelihood estimates can hence be obtained in closed form. Theoretical and simulation results showed that the proposed EM algorithm outperforms the existing EM algorithm. Three real data from a neurological study, a rheumatoid arthritis study and a wheeze study are used to illustrate the methodologies.