Penalized Model-Based Clustering with Application to Variable Selection

Authors:
Wei Pan;Xiaotong Shen
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2007

Citing 4
Cited 11

Random Forests

Machine Learning
Modelling high-dimensional data by mixtures of factor analyzers

Computational Statistics & Data Analysis
Class discovery and classification of tumor samples using mixture modeling of gene expression data---a unified approach

Bioinformatics
Semi-supervised learning via penalized mixture model with application to microarray sample classification

Bioinformatics

Penalized factor mixture analysis for variable selection in clustered data

Computational Statistics & Data Analysis
Sparse Bayesian hierarchical modeling of high-dimensional clustering problems

Journal of Multivariate Analysis
Model-based subspace clustering of non-Gaussian data

Neurocomputing
Predicting future reviews: sentiment analysis models for collaborative filtering

Proceedings of the fourth ACM international conference on Web search and data mining
Model-based clustering of high-dimensional data: Variable selection versus facet determination

International Journal of Approximate Reasoning
Biomarker Identification and Cancer Classification Based on Microarray Data Using Laplace Naive Bayes Model with Mean Shrinkage

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Simultaneous Bayesian clustering and feature selection using RJMCMC-based learning of finite generalized Dirichlet mixture models

Signal Processing
Model-based clustering of high-dimensional data: A review

Computational Statistics & Data Analysis
Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty

The Journal of Machine Learning Research
A LASSO-penalized BIC for mixture model selection

Advances in Data Analysis and Classification
Embedded local feature selection within mixture of experts

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Variable selection in clustering analysis is both challenging and important. In the context of model-based clustering analysis with a common diagonal covariance matrix, which is especially suitable for "high dimension, low sample size" settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method.