Practical Identifiability of Finite Mixtures of Multivariate Bernoulli Distributions

Authors:
Miguel Á. Carreira-Perpiñán;Steve Á. Renals
Affiliations:
-;-
Venue:
Neural Computation
Year:
2000

Citing 1
Cited 11

Dimensionality reduction of electropalatographic data using latent variable models

Speech Communication

Topics in 0--1 data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data

IEEE Transactions on Knowledge and Data Engineering
Discriminating self from non-self with finite mixtures of multivariate Bernoulli distributions

Proceedings of the 10th annual conference on Genetic and evolutionary computation
Minimum Information Loss Cluster Analysis for Categorical Data

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
M&M: multi-level Markov model for wireless link simulations

Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems
Mixture modeling of DNA copy number amplification patterns in cancer

IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
Tractable multivariate binary density estimation and the restricted boltzmann forest

Neural Computation
EM cluster analysis for categorical data

SSPR'06/SPR'06 Proceedings of the 2006 joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Adaptation of a mixture of multivariate Bernoulli distributions

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Machine learning using Bernoulli mixture models: Clustering, rule extraction and dimensionality reduction

Neurocomputing
Improving wireless link simulation using multilevel markov models

ACM Transactions on Sensor Networks (TOSN)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The class of finite mixtures of multivariate Bernoulli distributions is known to be nonidentifiable; that is, different values of the mixture parameters can correspond to exactly the same probability distribution. In principle, this would mean that sample estimates using this model would give rise to different interpretations. We give empirical support to the fact that estimation of this class of mixtures can still produce meaningful results in practice, thus lessening the importance of the identifiability problem. We also show that the expectation-maximization algorithm is guaranteed to converge to a proper maximum likelihood estimate, owing to a property of the log-likelihood surface. Experiments with synthetic data sets show that an original generating distribution can be estimated from a sample. Experiments with an electropalatography data set show important structure in the data.