Feature Selection for Gene Expression Using Model-Based Entropy

Authors:
Shenghuo Zhu;Dingding Wang;Kai Yu;Tao Li;Yihong Gong
Affiliations:
NEC Laboratories America, Cupertino;Florida International University, Miami;NEC Laboratories America, Cupertino;Florida International University, Miami;NEC Laboratories America, Cupertino
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2010

Citing 9
Cited 16

Elements of information theory

Elements of information theory
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Feature selection for high-dimensional genomic microarray data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Theoretical and Empirical Analysis of ReliefF and RReliefF

Machine Learning
Redundancy based feature selection for microarray data

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Bioinformatics
Active learning via transductive experimental design

ICML '06 Proceedings of the 23rd international conference on Machine learning
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)

Comparative document summarization via discriminative sentence selection

Proceedings of the 18th ACM conference on Information and knowledge management
LIBGS: A MATLAB software package for gene selection

International Journal of Data Mining and Bioinformatics
Quadratic Programming Feature Selection

The Journal of Machine Learning Research
Redundant feature elimination by using approximate Markov blanket based on discriminative contribution

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Stable Gene Selection from Microarray Data via Sample Weighting

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Comparing the dimensionality reduction methods in gene expression databases

Expert Systems with Applications: An International Journal
A New Unsupervised Feature Ranking Method for Gene Expression Data Based on Consensus Affinity

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Comparative document summarization via discriminative sentence selection

ACM Transactions on Knowledge Discovery from Data (TKDD)
Biomarker Identification and Cancer Classification Based on Microarray Data Using Laplace Naive Bayes Model with Mean Shrinkage

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Design and Analysis of Classifier Learning Experiments in Bioinformatics: Survey and Case Studies

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Investigating Topic Models' Capabilities in Expression Microarray Data Classification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Comparative Document Summarization via Discriminative Sentence Selection

ACM Transactions on Knowledge Discovery from Data (TKDD)
Unsupervised Feature Selection with Feature Clustering

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis

Journal of Biomedical Informatics
Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data

Computers in Biology and Medicine
Review: Knowledge discovery in medicine: Current issue and future trend

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Gene expression data usually contain a large number of genes but a small number of samples. Feature selection for gene expression data aims at finding a set of genes that best discriminate biological samples of different types. Using machine learning techniques, traditional gene selection based on empirical mutual information suffers the data sparseness issue due to the small number of samples. To overcome the sparseness issue, we propose a model-based approach to estimate the entropy of class variables on the model, instead of on the data themselves. Here, we use multivariate normal distributions to fit the data, because multivariate normal distributions have maximum entropy among all real-valued distributions with a specified mean and standard deviation and are widely used to approximate various distributions. Given that the data follow a multivariate normal distribution, since the conditional distribution of class variables given the selected features is a normal distribution, its entropy can be computed with the log-determinant of its covariance matrix. Because of the large number of genes, the computation of all possible log-determinants is not efficient. We propose several algorithms to largely reduce the computational cost. The experiments on seven gene data sets and the comparison with other five approaches show the accuracy of the multivariate Gaussian generative model for feature selection, and the efficiency of our algorithms.