Unsupervised learning of parsimonious mixtures on large spaces with integrated feature and component selection

Authors:
M.W. Graham;D.J. Miller
Affiliations:
Dept. of Electr. Eng., Pennsylvania State Univ., University Park, PA, USA;-
Venue:
IEEE Transactions on Signal Processing
Year:
2006

Citing 0
Cited 9

Learning correlations using the mixture-of-subsets model

ACM Transactions on Knowledge Discovery from Data (TKDD)
Using backward elimination with a new model order reduction algorithm to select best double mixture model for document clustering

Expert Systems with Applications: An International Journal
On multivariate binary data clustering and feature weighting

Computational Statistics & Data Analysis
Model-based subspace clustering of non-Gaussian data

Neurocomputing
A finite mixture model for simultaneous high-dimensional clustering, localized feature selection and outlier rejection

Expert Systems with Applications: An International Journal
Improved generative semisupervised learning based on finely grained component-conditional class labeling

Neural Computation
Stochastic approximation learning for mixtures of multivariate elliptical distributions

Neurocomputing
Unsupervised feature and model selection for generalized Dirichlet mixture models

ICIAR'07 Proceedings of the 4th international conference on Image Analysis and Recognition
Semi-supervised projected model-based clustering

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	35.69

Visualization

Abstract

Estimating the number of components (the order) in a mixture model is often addressed using criteria such as the Bayesian information criterion (BIC) and minimum message length. However, when the feature space is very large, use of these criteria may grossly underestimate the order. Here, it is suggested that this failure is not mainly attributable to the criterion (e.g., BIC), but rather to the lack of "structure" in standard mixtures-these models trade off data fitness and model complexity only by varying the order. The authors of the present paper propose mixtures with a richer set of tradeoffs. The proposed model allows each component its own informative feature subset, with all other features explained by a common model (shared by all components). Parameter sharing greatly reduces complexity at a given order. Since the space of these parsimonious modeling solutions is vast, this space is searched in an efficient manner, integrating the component and feature selection within the generalized expectation-maximization (GEM) learning for the mixture parameters. The quality of the proposed (unsupervised) solutions is evaluated using both classification error and test set data likelihood. On text data, the proposed multinomial version-learned without labeled examples, without knowing the "true" number of topics, and without feature preprocessing-compares quite favorably with both alternative unsupervised methods and with a supervised naive Bayes classifier. A Gaussian version compares favorably with a recent method introducing "feature saliency" in mixtures.