Simultaneous Feature Selection and Clustering Using Mixture Models

Authors:
Martin H. C. Law;Mario A. T. Figueiredo;Anil K. Jain
Affiliations:
IEEE;IEEE;IEEE
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2004

Citing 36
Cited 80

Algorithms for clustering data

Algorithms for clustering data
Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised texture segmentation using Gabor filters

Pattern Recognition
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Floating search methods in feature selection

Pattern Recognition Letters
Cluster-based text categorization: a comparison of category search strategies

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Divergence Based Feature Selection for Multimodal Class Densities

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Selection: Evaluation, Application, and Small Sample Performance

IEEE Transactions on Pattern Analysis and Machine Intelligence
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A Robust Competitive Clustering Algorithm With Applications in Computer Vision

IEEE Transactions on Pattern Analysis and Machine Intelligence
Concept Learning and Feature Selection Based on Square-Error Clustering

Machine Learning
Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data clustering: a review

ACM Computing Surveys (CSUR)
Feature selection in unsupervised learning via evolutionary search

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Learning of Finite Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
MML clustering of multi-state, Poisson, vonMises circular and Gaussian distributions

Statistics and Computing
Input Feature Selection by Mutual Information Based on Parzen Window

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Subset Selection Using a Genetic Algorithm

IEEE Intelligent Systems
Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Repairing Faulty Mixture Models using Density Estimation

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Feature selection for high-dimensional genomic microarray data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Efficient Feature Selection in Conceptual Clustering

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Subset Selection and Order Identification for Unsupervised Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Feature Weighting in k-Means Clustering

Machine Learning
Using machine learning to improve information access

Using machine learning to improve information access
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Dependency-based feature selection for clustering symbolic data

Intelligent Data Analysis
Conceptual clustering in information retrieval

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Clustering quality based feature selection method

Machine Graphics & Vision International Journal
Bayesian Feature and Model Selection for Gaussian Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Determining feature relevance for the grouping of motor unit action potentials through generative topographic mapping

MIC'06 Proceedings of the 25th IASTED international conference on Modeling, indentification, and control
Characterization of atypical virtual campus usage behavior through robust generative relevance analysis

WBE'06 Proceedings of the 5th IASTED international conference on Web-based education
Feature selection in robust clustering based on Laplace mixture

Pattern Recognition Letters
MILES: Multiple-Instance Learning via Embedded Instance Selection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Subset Selection and Ranking for Data Dimensionality Reduction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Attention-based similarity

Pattern Recognition
Using association patterns for discrete-valed data clustering

AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
Localized feature selection for clustering

Pattern Recognition Letters
Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm

Computational Statistics & Data Analysis
Advances in clustering and visualization of time series using GTM through time

Neural Networks
Multinomial mixture model with feature selection for text clustering

Knowledge-Based Systems
Feature Weighted Rival Penalized EM for Gaussian Mixture Clustering: Automatic Feature and Model Selections in a Single Paradigm

Computational Intelligence and Security
A Graphical Model for Content Based Image Suggestion and Feature Selection

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
A new feature selection method for Gaussian mixture clustering

Pattern Recognition
A Statistical Approach for Binary Vectors Modeling and Clustering

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Variational Bayesian Approach for Long-Term Relevance Feedback

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Variable selection in model-based clustering: A general variable role modeling

Computational Statistics & Data Analysis
A scalable framework for discovering coherent co-clusters in noisy data

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
On multivariate binary data clustering and feature weighting

Computational Statistics & Data Analysis
A bacterial evolutionary algorithm for automatic data clustering

CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
A fast band selection method to increase image contrast for multispectral image segmentation

ISBI'09 Proceedings of the Sixth IEEE international conference on Symposium on Biomedical Imaging: From Nano to Macro
Learning the number of Gaussian cusing hypothesis test

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Robust analysis of MRS brain tumour data using t-GTM

Neurocomputing
Active curve axis Gaussian mixture models

Pattern Recognition
Novel artificial intelligent techniques via AFS theory: Feature selection, concept categorization and characteristic description

Applied Soft Computing
Regularized data fusion improves image segmentation

Proceedings of the 29th DAGM conference on Pattern recognition
A maximum weighted likelihood approach to simultaneous model selection and feature weighting in Gaussian mixture

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Model-based subspace clustering of non-Gaussian data

Neurocomputing
An outlier-aware data clustering algorithm in mixture models

ICICS'09 Proceedings of the 7th international conference on Information, communications and signal processing
Unsupervised feature selection for multi-cluster data

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Document clustering via dirichlet process mixture model with feature selection

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning multiple nonredundant clusterings

ACM Transactions on Knowledge Discovery from Data (TKDD)
Long-term relevance feedback and feature selection for adaptive content based image suggestion

Pattern Recognition
Evolutionary-rough feature selection for face recognition

Transactions on rough sets XII
The SEM statistical mixture model of segmentation algorithm of brain vessel image

LSMS/ICSEE'10 Proceedings of the 2010 international conference on Life system modeling and simulation and intelligent computing, and 2010 international conference on Intelligent computing for sustainable energy and environment: Part III
A unifying criterion for unsupervised clustering and feature selection

Pattern Recognition
Adapt the mRMR criterion for unsupervised feature selection

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Unsupervised feature selection for salient object detection

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part II
An entropy weighting mixture model for subspace clustering of high-dimensional data

Pattern Recognition Letters
Simultaneous model selection and feature selection via BYY harmony learning

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part II
Target segmentation in scenes with diverse background

SCIA'11 Proceedings of the 17th Scandinavian conference on Image analysis
Simultaneous non-gaussian data clustering, feature selection and outliers rejection

PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
Eigenvector sensitive feature selection for spectral clustering

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Weighted and constrained possibilistic C-means clustering for online fault detection and isolation

Applied Intelligence
Model-based multidimensional clustering of categorical data

Artificial Intelligence
Simultaneous model-based clustering and visualization in the Fisher discriminative subspace

Statistics and Computing
Model-Based estimation of word saliency in text

DS'06 Proceedings of the 9th international conference on Discovery Science
Assessment of an unsupervised feature selection method for generative topographic mapping

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II
Data clustering: a user’s dilemma

PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Decision fusion based unsupervised texture image segmentation

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Finding uninformative features in binary data

IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
Immune multiobjective optimization algorithm for unsupervised feature selection

EuroGP'06 Proceedings of the 2006 international conference on Applications of Evolutionary Computing
An evaluation of filter and wrapper methods for feature selection in categorical clustering

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
A finite mixture model for simultaneous high-dimensional clustering, localized feature selection and outlier rejection

Expert Systems with Applications: An International Journal
Feature subset-wise mixture model-based clustering via local search algorithm

AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
A robust approach for multivariate binary vectors clustering and feature selection

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Time series relevance determination through a topology-constrained hidden markov model

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Unsupervised gene selection and clustering using simulated annealing

WILF'05 Proceedings of the 6th international conference on Fuzzy Logic and Applications
Unsupervised feature and model selection for generalized Dirichlet mixture models

ICIAR'07 Proceedings of the 4th international conference on Image Analysis and Recognition
Generalized Gaussian mixture models as a nonparametric Bayesian approach for clustering using class-specific visual features

Journal of Visual Communication and Image Representation
A decision support method, based on bounded rationality concepts, to reveal feature saliency in clustering problems

Decision Support Systems
Model-based clustering of high-dimensional data: Variable selection versus facet determination

International Journal of Approximate Reasoning
Simultaneous feature selection and clustering using particle swarm optimization

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I
Nonparametric localized feature selection via a dirichlet process mixture of generalized dirichlet distributions

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Fuzzy Linear Discriminant Analysis-guided maximum entropy fuzzy clustering algorithm

Pattern Recognition
Simultaneous Bayesian clustering and feature selection using RJMCMC-based learning of finite generalized Dirichlet mixture models

Signal Processing
On online high-dimensional spherical data clustering and feature selection

Engineering Applications of Artificial Intelligence
Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection

Pattern Recognition
Probability-based text clustering algorithm by alternately repeating two operations

Journal of Information Science
A semi-supervised feature selection method using a non-parametric technique with pairwise instance constraints

Journal of Information Science
Machine learning using Bernoulli mixture models: Clustering, rule extraction and dimensionality reduction

Neurocomputing
Online variational learning of generalized Dirichlet mixture models with feature selection

Neurocomputing
Model-based clustering of high-dimensional data: A review

Computational Statistics & Data Analysis
A mixed integer linear model for clustering with variable selection

Computers and Operations Research
A survey on feature selection methods

Computers and Electrical Engineering
Fuzzy clustering with biological knowledge for gene selection

Applied Soft Computing
Non-Gaussian Data Clustering via Expectation Propagation Learning of Finite Dirichlet Mixture Models and Applications

Neural Processing Letters
Semi-supervised projected model-based clustering

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.15

Visualization

Abstract

Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectation-maximization (EM) algorithm to estimate it, in the context of mixture-based clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.