Maximum Likelihood Estimation of Mixture Densities for Binned and Truncated Multivariate Data

Authors:
Igor V. Cadez;Padhraic Smyth;Geoff J. McLachlan;Christine E. McLaren
Affiliations:
Department of Information and Computer Science, University of California, Irvine, CA 92697, USA. icadez@ics.uci.edu;Department of Information and Computer Science, University of California, Irvine, CA 92697, USA. smyth@ics.uci.edu;Department of Mathematics, The University of Queensland, Brisbane, Australia;Division of Epidemiology, Department of Medicine, University of California, Irvine, CA 92697, USA
Venue:
Machine Learning - Special issue: Unsupervised learning
Year:
2002

Citing 11
Cited 8

Statistical analysis with missing data

Statistical analysis with missing data
Elements of statistical computing

Elements of statistical computing
Color indexing

International Journal of Computer Vision
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Intelligent multimedia information retrieval

Intelligent multimedia information retrieval
Histogram-based estimation techniques in database systems

Histogram-based estimation techniques in database systems
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multi-dimensional selectivity estimation using compressed histogram information

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Query by Image and Video Content: The QBIC System

Computer
Hierarchical Models for Screening of Iron Deficiency Anemia

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning

Unsupervised learning of a finite discrete mixture: Applications to texture modeling and image databases summarization

Journal of Visual Communication and Image Representation
Modelling of Magnetic Resonance Spectra Using Mixtures for Binned and Truncated Data

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part II
A classification EM algorithm for binned data

Computational Statistics & Data Analysis
Short Communication: Allowing for the effect of data binning in a Bayesian Normal mixture model

Computational Statistics & Data Analysis
Grouped data clustering using a fast mixture-model-based algorithm

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
A Takagi-Sugeno type neuro-fuzzy network for determining child anemia

Expert Systems with Applications: An International Journal
EM algorithms for multivariate Gaussian mixture models with truncated and censored data

Computational Statistics & Data Analysis
Eye movements as time-series random variables: A stochastic model of eye movement control in reading

Cognitive Systems Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571–578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.