A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Authors:
Tao Li;Chengliang Zhang;Mitsunori Ogihara
Affiliations:
Computer Science Department, University of Rochester, Rochester, NY 14627-0226, USA;Computer Science Department, University of Rochester, Rochester, NY 14627-0226, USA;Computer Science Department, University of Rochester, Rochester, NY 14627-0226, USA
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 103

Using Uncorrelated Discriminant Analysis for Tissue Classification with Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Robust and Accurate Cancer Classification with Gene Expression Profiling

CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Pattern classification in DNA microarray data of multiple tumor types

Pattern Recognition
Accurate Cancer Classification Using Expressions of Very Few Genes

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Multiclass Cancer Classification Using Semisupervised Ellipsoid ARTMAP and Particle Swarm Optimization with Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Differential prioritization in feature selection and classifier aggregation for multiclass microarray datasets

Data Mining and Knowledge Discovery
Computational and Theoretical Analysis of Null Space and Orthogonal Linear Discriminant Analysis

The Journal of Machine Learning Research
Markov blanket-embedded genetic algorithm for gene selection

Pattern Recognition
Direct integration of microarrays for selecting informative genes and phenotype classification

Information Sciences: an International Journal
Multicategory Classification Using An Extreme Learning Machine for Microarray Gene Expression Cancer Diagnosis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Gene selection for multiclass prediction by weighted fisher criterion

EURASIP Journal on Bioinformatics and Systems Biology
Selecting differentially expressed genes using minimum probability of classification error

Journal of Biomedical Informatics
Gene expression modeling through positive boolean functions

International Journal of Approximate Reasoning
Research Article: Hybrid particle swarm optimization and tabu search approach for selecting genes for tumor classification using gene expression data

Computational Biology and Chemistry
Research Article: Hybrid particle swarm optimization and tabu search approach for selecting genes for tumor classification using gene expression data

Computational Biology and Chemistry
Structural Risk Minimisation based gene expression profiling analysis

International Journal of Bioinformatics Research and Applications
Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers

Computer Methods and Programs in Biomedicine
Stable feature selection via dense feature groups

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Identifying biologically relevant genes via multiple heterogeneous data sources

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Heterogeneous data fusion for alzheimer's disease study

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic multi-class strategy of one-vs.-rest support vector machines for cancer classification

Neurocomputing
A Cost-Sensitive Approach to Feature Selection in Micro-Array Data Classification

WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
Ensemble Neural Networks with Novel Gene-Subsets for Multiclass Cancer Classification

Neural Information Processing
A Model-Based Relevance Estimation Approach for Feature Selection in Microarray Datasets

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part II
Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
APPLYING DATA MINING TECHNIQUES FOR CANCER CLASSIFICATION ON GENE EXPRESSION DATA

Cybernetics and Systems
Performance of feature-selection methods in the classification of high-dimension data

Pattern Recognition
A Study on the Importance of Differential Prioritization in Feature Selection Using Toy Datasets

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Efficient multi-class cancer diagnosis algorithm, using a global similarity pattern

Computational Statistics & Data Analysis
New gene selection method for multiclass tumor classification by class centroid

Journal of Biomedical Informatics
Simple Bayesian binary framework for discovering significant genes and classifying cancer diagnosis

Computational Statistics & Data Analysis
An expert system to classify microarray gene expression data using gene selection by decision tree

Expert Systems with Applications: An International Journal
Evaluating switching neural networks through artificial and real gene expression data

Artificial Intelligence in Medicine
F-score with Pareto Front Analysis for Multiclass Gene Selection

EvoBIO '09 Proceedings of the 7th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
Feature cluster selection for high-throughput data analysis

International Journal of Data Mining and Bioinformatics
The Impact of Gene Selection on Imbalanced Microarray Expression Data

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Parallel Selection of Informative Genes for Classification

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Multiclass classification and gene selection with a stochastic algorithm

Computational Statistics & Data Analysis
Gene boosting for cancer classification based on gene expression profiles

Pattern Recognition
Optimal Aggregation of Binary Classifiers for Multiclass Cancer Diagnosis Using Gene Expression Profiles

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Utilization of virtual samples to facilitate cancer identification for DNA microarray data in the early stages of an investigation

Information Sciences: an International Journal
Consensus group stable feature selection

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Detailed methylation prediction of CpG islands on human chromosome 21

MCBC'09 Proceedings of the 10th WSEAS international conference on Mathematics and computers in biology and chemistry
Interpretation of gene expression microarray experiments

Proceedings of the 2007 conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies
Comparison of feature selection and classification combinations for cancer classification using microarray data

International Journal of Bioinformatics Research and Applications
A Framework for Multi-class Learning in Micro-array Data Analysis

AIME '09 Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine
Brief communication: Reducing multiclass cancer classification to binary by output coding and SVM

Computational Biology and Chemistry
Exploiting scale-free information from expression data for cancer classification

Computational Biology and Chemistry
Incremental non-gaussian analysis of microarray gene expression data

Proceedings of the third international workshop on Data and text mining in bioinformatics
Feature selection with biased sample distributions

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Multi-category bioinformatics dataset classification using extreme learning machine

CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Binary matrix factorization for analyzing gene expression data

Data Mining and Knowledge Discovery
Ensemble gene selection by grouping for microarray data classification

Journal of Biomedical Informatics
Feature Selection for Gene Expression Using Model-Based Entropy

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction

Computers in Biology and Medicine
Tumor tissue identification based on gene expression data using DWT feature extraction and PNN classifier

Neurocomputing
Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps

Artificial Intelligence in Medicine
Multiclass microarray data classification using GA/ANN method

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
A support vector machine ensemble for cancer classification using gene expression data

ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
Capturing heuristics and intelligent methods for improving micro-array data classification

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Ensemble approaches of support vector machines for multiclass classification

PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
Identification of Full and Partial Class Relevant Genes

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
LIBGS: A MATLAB software package for gene selection

International Journal of Data Mining and Bioinformatics
Towards a memetic feature selection paradigm

IEEE Computational Intelligence Magazine
Quadratic Programming Feature Selection

The Journal of Machine Learning Research
Matched Gene Selection and Committee Classifier for Molecular Classification of Heterogeneous Diseases

The Journal of Machine Learning Research
Selecting few genes for microarray gene expression classification

CAEPIA'09 Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence
Ensemble methods and model based diagnosis using possible conflicts and system decomposition

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes

Pattern Recognition
Multi-class pattern classification based on a probabilistic model of combining binary classifiers

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Multi-class cancer classification with OVR-support vector machines selected by naïve bayes classifier

ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Two-Step Cross-Entropy Feature Selection for Microarrays—Power Through Complementarity

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Multi-platform gene-expression mining and marker gene analysis

International Journal of Data Mining and Bioinformatics
Wrapper- and ensemble-based feature subset selection methods for biomarker discovery in targeted metabolomics

PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
Stable Gene Selection from Microarray Data via Sample Weighting

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Knowledge discovery in the identification of differentially expressed genes in tumoricidal macrophage

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Relevance, redundancy and differential prioritization in feature selection for multiclass gene expression data

ISBMDA'05 Proceedings of the 6th International conference on Biological and Medical Data Analysis
Microarray gene expression classification with few genes: Criteria to combine attribute selection and classification methods

Expert Systems with Applications: An International Journal
OVA scheme vs. single machine approach in feature selection for microarray datasets

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Virtual gene: using correlations between genes to select informative genes on microarray datasets

Transactions on Computational Systems Biology II
A framework and its empirical study of automatic diagnosis of traditional Chinese medicine utilizing raw free-text clinical records

Journal of Biomedical Informatics
A novel combination of time phase and EEG frequency components for SSVEP-Based BCI

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part I
Comparison of gene identification based on artificial neural network pre-processing with k-means cluster and principal component analysis

WILF'05 Proceedings of the 6th international conference on Fuzzy Logic and Applications
A Top-r Feature Selection Algorithm for Microarray Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Coordinate ascent for penalized semiparametric regression on high-dimensional panel count data

Computational Statistics & Data Analysis
Efficient classifiers for multi-class classification problems

Decision Support Systems
Evaluation of the importance of data pre-processing order when combining feature selection and data sampling

International Journal of Business Intelligence and Data Mining
Informative gene selection and tumor classification by null space LDA for microarray data

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Concurrent control chart patterns recognition with singular spectrum analysis and support vector machine

Computers and Industrial Engineering
Feature selection using counting grids: application to microarray data

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Online monitoring and fault identification of mean shifts in bivariate processes using decision tree learning techniques

Journal of Intelligent Manufacturing
A Multiclass Classification Tool Using Cloud Computing Architecture

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis

Journal of Biomedical Informatics
Performance evaluation of ranking methods for relevant gene selection in cancer microarray datasets

MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
An ensemble of SVM classifiers based on gene pairs

Computers in Biology and Medicine
Multiclass Gene Selection Using Pareto-Fronts

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Statistical shape model for manifold regularization: Gleason grading of prostate histology

Computer Vision and Image Understanding
Stable Feature Selection with Minimal Independent Dominating Sets

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Identifying informative genes for prediction of breast cancer subtypes

PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics
What's buzzing in the blizzard of buzz? Automotive component isolation in social media postings

Decision Support Systems
A feature selection method using fixed-point algorithm for DNA microarray gene expression data

International Journal of Knowledge-based and Intelligent Engineering Systems
A feature selection method using improved regularized linear discriminant analysis

Machine Vision and Applications

Quantified Score

Hi-index	3.84

Visualization

Abstract

Summary: This paper studies the problem of building multiclass classifiers for tissue classification based on gene expression. The recent development of microarray technologies has enabled biologists to quantify gene expression of tens of thousands of genes in a single experiment. Biologists have begun collecting gene expression for a large number of samples. One of the urgent issues in the use of microarray data is to develop methods for characterizing samples based on their gene expression. The most basic step in the research direction is binary sample classification, which has been studied extensively over the past few years. This paper investigates the next step---multiclass classification of samples based on gene expression. The characteristics of expression data (e.g. large number of genes with small sample size) makes the classification problem more challenging. The process of building multiclass classifiers is divided into two components: (i) selection of the features (i.e. genes) to be used for training and testing and (ii) selection of the classification method. This paper compares various feature selection methods as well as various state-of-the-art classification methods on various multiclass gene expression datasets. Our study indicates that multiclass classification problem is much more difficult than the binary one for the gene expression datasets. The difficulty lies in the fact that the data are of high dimensionality and that the sample size is small. The classification accuracy appears to degrade very rapidly as the number of classes increases. In particular, the accuracy was very low regardless of the choices of the methods for large-class datasets (e.g. NCI60 and GCM). While increasing the number of samples is a plausible solution to the problem of accuracy degradation, it is important to develop algorithms that are able to analyze effectively multiple-class expression data for these special datasets.