Ensemble gene selection for cancer classification

Authors:
Huawen Liu;Lei Liu;Huijie Zhang
Affiliations:
College of Computer Science and Technology, Jilin University, Changchun 130012, PR China;College of Computer Science and Technology, Jilin University, Changchun 130012, PR China;Department of Computer Science, Northeast Normal University, Changchun 130021, PR China
Venue:
Pattern Recognition
Year:
2010

Citing 32
Cited 7

Elements of information theory

Elements of information theory
Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Formalism for Relevance and Its Application in Feature Subset Selection

Machine Learning
Cancer classification using gene expression data

Information Systems - Special issue: Data management in bioinformatics
An introduction to variable and feature selection

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Boosting Image Retrieval

International Journal of Computer Vision - Special Issue on Content-Based Image Retrieval
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
Fast Binary Feature Selection with Conditional Mutual Information

The Journal of Machine Learning Research
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Non-parametric classifier-independent feature selection

Pattern Recognition
Incremental wrapper-based gene selection from microarray data for cancer classification

Pattern Recognition
Cancer classification using ensemble of neural networks with multiple significant gene subsets

Applied Intelligence
Markov blanket-embedded genetic algorithm for gene selection

Pattern Recognition
Classifier ensembles: Select real-world applications

Information Fusion
Effective Gene Selection Method With Small Sample Sets Using Gradient-Based and Point Injection Techniques

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Ensemble methods for classification of patients for personalized medicine with high-dimensional data

Artificial Intelligence in Medicine
Selecting differentially expressed genes using minimum probability of classification error

Journal of Biomedical Informatics
MSVM-RFE

Bioinformatics
An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer

Artificial Intelligence in Medicine
A review of feature selection techniques in bioinformatics

Bioinformatics
Monte Carlo feature selection for supervised classification

Bioinformatics
Stable feature selection via dense feature groups

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Feature selection, mutual information, and the classification of high-dimensional patterns: Applications to image classification and microarray data analysis

Pattern Analysis & Applications - Special Issue: Non-parametric distance-based classification techniques and their applications
APPLYING DATA MINING TECHNIQUES FOR CANCER CLASSIFICATION ON GENE EXPRESSION DATA

Cybernetics and Systems
A feature selection technique for generation of classification committees and its application to categorization of laryngeal images

Pattern Recognition
Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors

Artificial Intelligence in Medicine
Feature selection with dynamic mutual information

Pattern Recognition
The classification of cancer based on DNA microarray data that uses diverse ensemble genetic programming

Artificial Intelligence in Medicine
Learning to classify by ongoing feature selection

Image and Vision Computing
An Evolutionary Algorithm Approach to Optimal Ensemble Classifiers for DNA Microarray Data Analysis

IEEE Transactions on Evolutionary Computation

Feature selection using hierarchical feature clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data

Pattern Recognition
Noisy data elimination using mutual k-nearest neighbor for classification mining

Journal of Systems and Software
InstanceRank based on borders for instance selection

Pattern Recognition
Algorithms for discovery of multiple Markov boundaries

The Journal of Machine Learning Research
Gene selection with guided regularized random forest

Pattern Recognition
Review article: Computational intelligence techniques in bioinformatics

Computational Biology and Chemistry

Quantified Score

Hi-index	0.01

Visualization

Abstract

Cancer diagnosis is an important emerging clinical application of microarray data. Its accurate prediction to the type or size of tumors relies on adopting powerful and reliable classification models, so as to patients can be provided with better treatment or response to therapy. However, the high dimensionality of microarray data may bring some disadvantages, such as over-fitting, poor performance and low efficiency, to traditional classification models. Thus, one of the challenging tasks in cancer diagnosis is how to identify salient expression genes from thousands of genes in microarray data that can directly contribute to the phenotype or symptom of disease. In this paper, we propose a new ensemble gene selection method (EGS) to choose multiple gene subsets for classification purpose, where the significant degree of gene is measured by conditional mutual information or its normalized form. After different gene subsets have been obtained by setting different starting points of the search procedure, they will be used to train multiple base classifiers and then aggregated into a consensus classifier by the manner of majority voting. The proposed method is compared with five popular gene selection methods on six public microarray datasets and the comparison results show that our method works well.