A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis

Authors:
Alexander Statnikov;Constantin F. Aliferis;Ioannis Tsamardinos;Douglas Hardin;Shawn Levy
Affiliations:
Department of Biomedical Informatics, Vanderbilt University Nashville, TN, USA;Department of Biomedical Informatics, Vanderbilt University Nashville, TN, USA;Department of Biomedical Informatics, Vanderbilt University Nashville, TN, USA;Department of Mathematics, Vanderbilt University Nashville, TN, USA;Department of Biomedical Informatics, Vanderbilt University Nashville, TN, USA
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 115

Using Uncorrelated Discriminant Analysis for Tissue Classification with Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Simple and effective visual models for gene expression cancer diagnostics

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Iterative RELIEF for feature weighting

ICML '06 Proceedings of the 23rd international conference on Machine learning
Pattern classification in DNA microarray data of multiple tumor types

Pattern Recognition
Link test-A statistical method for finding prostate cancer biomarkers

Computational Biology and Chemistry
Margin Trees for High-dimensional Classification

The Journal of Machine Learning Research
Markov blanket-embedded genetic algorithm for gene selection

Pattern Recognition
Clustering gene expression data via mining ensembles of classification rules evolved using moses

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Understanding microarray data through applying competent program evolution

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Gene expression profile class prediction using linear Bayesian classifiers

Computers in Biology and Medicine
Multicategory Classification Using An Extreme Learning Machine for Microarray Gene Expression Cancer Diagnosis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Effects of SVM parameter optimization on discrimination and calibration for post-procedural PCI mortality

Journal of Biomedical Informatics
Methodological Review: Towards knowledge-based gene expression data mining

Journal of Biomedical Informatics
Improved binary PSO for feature selection using gene expression data

Computational Biology and Chemistry
Evolutionary design of multiclass support vector machines

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - VIII Brazilian Symposium on Neural Networks
Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers

Computer Methods and Programs in Biomedicine
Reducing microarray data via nonnegative matrix factorization for visualization and clustering analysis

Journal of Biomedical Informatics
Wrapper filtering criteria via linear neuron and kernel approaches

Computers in Biology and Medicine
On α-divergence based nonnegative matrix factorization for clustering cancer gene expression data

Artificial Intelligence in Medicine
A probabilistic multi-class strategy of one-vs.-rest support vector machines for cancer classification

Neurocomputing
Sparse p-norm Nonnegative Matrix Factorization for clustering gene expression data

International Journal of Data Mining and Bioinformatics
A Cost-Sensitive Approach to Feature Selection in Micro-Array Data Classification

WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
Ensemble Neural Networks with Novel Gene-Subsets for Multiclass Cancer Classification

Neural Information Processing
APPLYING DATA MINING TECHNIQUES FOR CANCER CLASSIFICATION ON GENE EXPRESSION DATA

Cybernetics and Systems
Fuzzy rule induction and artificial immune systems in female breast cancer familiarity profiling

International Journal of Hybrid Intelligent Systems - Recent Advances in Intelligent Paradigms Fusion and Their Applications
Wavelet feature extraction for high-dimensional microarray data

Neurocomputing
Sparse p-norm Nonnegative Matrix Factorization for clustering gene expression data

International Journal of Data Mining and Bioinformatics
Survival prediction using gene expression data: A review and comparison

Computational Statistics & Data Analysis
An expert system to classify microarray gene expression data using gene selection by decision tree

Expert Systems with Applications: An International Journal
The Impact of Gene Selection on Imbalanced Microarray Expression Data

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Parallel Selection of Informative Genes for Classification

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Gene boosting for cancer classification based on gene expression profiles

Pattern Recognition
Optimal Aggregation of Binary Classifiers for Multiclass Cancer Diagnosis Using Gene Expression Profiles

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Prediction of Cancer Class with Majority Voting Genetic Programming Classifier Using Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Cancer classification using microarray and layered architecture genetic programming

Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Comparison of feature selection and classification combinations for cancer classification using microarray data

International Journal of Bioinformatics Research and Applications
A Framework for Multi-class Learning in Micro-array Data Analysis

AIME '09 Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine
Using Supervised Complexity Measures in the Analysis of Cancer Gene Expression Data Sets

BSB '09 Proceedings of the 4th Brazilian Symposium on Bioinformatics: Advances in Bioinformatics and Computational Biology
Improving k-NN for Human Cancer Classification Using the Gene Expression Profiles

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Using the GEMS system for cancer diagnosis and biomarker discovery from microarray gene expression data

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Boosting kernel discriminant analysis and its application to tissue classification of gene expression data

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Comments on selected fundamental aspects of microarray analysis

Computational Biology and Chemistry
Microarray analysis of autoimmune diseases by machine learning procedures

IEEE Transactions on Information Technology in Biomedicine
Multi-category bioinformatics dataset classification using extreme learning machine

CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Kernel Alignment k-NN for Human Cancer Classification Using the Gene Expression Profiles

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part II
Ant Colony Optimisation Classification for Gene Expression Data Analysis

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
A novel measure for evaluating classifiers

Expert Systems with Applications: An International Journal
Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps

Artificial Intelligence in Medicine
Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation

The Journal of Machine Learning Research
Frequent variable sets based clustering for artificial neural networks particle classification

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Expression microarray classification using topic models

Proceedings of the 2010 ACM Symposium on Applied Computing
Capturing heuristics and intelligent methods for improving micro-array data classification

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Identification of Full and Partial Class Relevant Genes

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Orthogonal linear discriminant analysis and feature selection for micro-array data classification

Expert Systems with Applications: An International Journal
Data mining of gene expression data by fuzzy and hybrid fuzzy methods

IEEE Transactions on Information Technology in Biomedicine
Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples

Computers in Biology and Medicine
Decision forest for classification of gene expression data

Computers in Biology and Medicine
Automated segmentation of tissue images for computerized IHC analysis

Computer Methods and Programs in Biomedicine
Mining patterns in disease classification forests

Journal of Biomedical Informatics
Matched Gene Selection and Committee Classifier for Molecular Classification of Heterogeneous Diseases

The Journal of Machine Learning Research
Wavelet selection for disease classification by DNA microarray data

Expert Systems with Applications: An International Journal
Supervised learning based power management for multicore processors

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Biologically-aware latent dirichlet allocation (BaLDA) for the classification of expression microarray

PRIB'10 Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics
A novel support vector sampling technique to improve classification accuracy and to identify key genes of leukaemia and prostate cancers

Expert Systems with Applications: An International Journal
ICGA-PSO-ELM Approach for Accurate Multiclass Cancer Classification Resulting in Reduced Gene Sets in Which Genes Encoding Secreted Proteins Are Highly Represented

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
The fuzzy gene filter: an adaptive fuzzy inference system for expression array feature selection

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part III
A New and Fast Orthogonal Linear Discriminant Analysis on Undersampled Problems

SIAM Journal on Scientific Computing
Improving accuracy of microarray classification by a simple multi-task feature selection filter

International Journal of Data Mining and Bioinformatics
A hybrid feature selection method for DNA microarray data

Computers in Biology and Medicine
Two-Step Cross-Entropy Feature Selection for Microarrays—Power Through Complementarity

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Multiple-kernel SVM based multiple-task oriented data mining system for gene expression data analysis

Expert Systems with Applications: An International Journal
CHIRP: a new classifier based on composite hypercubes on iterated random projections

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Analysis of complexity indices for classification problems: Cancer gene expression data

Neurocomputing
Fast Kernel Discriminant Analysis for Classification of Liver Cancer Mass Spectra

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Gene expression data classification based on improved semi-supervised local Fisher discriminant analysis

Expert Systems with Applications: An International Journal
Biclustering of expression microarray data using affinity propagation

PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
A new gene selection method based on random subspace ensemble for microarray cancer classification

PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
A comparison on score spaces for expression microarray data classification

PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
Evaluating feature selection for SVMs in high dimensions

ECML'06 Proceedings of the 17th European conference on Machine Learning
Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles

BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
Scoring method for tumor prediction from microarray data using an evolutionary fuzzy classifier

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Predicting miRNA-mediated gene silencing mode based on miRNA-target duplex features

Computers in Biology and Medicine
A corporate credit rating model using multi-class support vector machines with an ordinal pairwise partitioning approach

Computers and Operations Research
Conquering the curse of dimensionality in gene expression cancer diagnosis: tough problem, simple models

AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine
A new multi-task learning technique to predict classification of leukemia and prostate cancer

ICMB'10 Proceedings of the Second international conference on Medical Biometrics
Feature selection based on sensitivity analysis of fuzzy ISODATA

Neurocomputing
Cancer classification by kernel principal component self-regression

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Multivariate statistical tests for comparing classification algorithms

LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization
Gene Classification Using Parameter-Free Semi-Supervised Manifold Learning

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Computational analysis of muscular dystrophy sub-types using a novel integrative scheme

Neurocomputing
Biomarker discovery using 1-norm regularization for multiclass earthworm microarray gene expression data

Neurocomputing
An unsupervised approach to feature discretization and selection

Pattern Recognition
Model selection in omnivariate decision trees using Structural Risk Minimization

Information Sciences: an International Journal
A New Measure of Classifier Performance for Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
genEnsemble: A new model for the combination of classifiers and integration of biological knowledge applied to genomic data

Expert Systems with Applications: An International Journal
Substantial improvements in the set-covering projection classifier CHIRP (composite hypercubes on iterated random projections)

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
Relational co-clustering via manifold ensemble learning

Proceedings of the 21st ACM international conference on Information and knowledge management
Informative gene selection and tumor classification by null space LDA for microarray data

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Rademacher complexity and structural risk minimization: an application to human gene expression datasets

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Design and Analysis of Classifier Learning Experiments in Bioinformatics: Survey and Case Studies

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Investigating Topic Models' Capabilities in Expression Microarray Data Classification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Supervised pre-processing approaches in multiple class variables classification for fish recruitment forecasting

Environmental Modelling & Software
Gene selection for cancer tumor detection using a novel memetic algorithm with a multi-view fitness function

Engineering Applications of Artificial Intelligence
A Multiclass Classification Tool Using Cloud Computing Architecture

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis

Journal of Biomedical Informatics
2013 Special Issue: Methods for pattern selection, class-specific feature selection and classification for automated learning

Neural Networks
Algorithms for discovery of multiple Markov boundaries

The Journal of Machine Learning Research
Identification of micro RNA biomarkers for cancer by combining multiple feature selection techniques

Journal of Computational Methods in Sciences and Engineering
Unlabeled patterns to tighten Rademacher complexity error bounds for kernel classifiers

Pattern Recognition Letters
Extracting predictive SNPs in Crohn's disease using a vacillating genetic algorithm and a neural classifier in case-control association studies

Computers in Biology and Medicine
Identification of glioma cancer-alerted gene markers based on a diagnostic outcome correlation analysis preferential approach

International Journal of Data Mining and Bioinformatics
MaskedPainter: Feature selection for microarray data analysis

Intelligent Data Analysis
Gene-pair representation and incorporation of GO-based semantic similarity into classification of gene expression data

Intelligent Data Analysis - Combined Learning Methods and Mining Complex Data
Diverse accurate feature selection for microarray cancer diagnosis

Intelligent Data Analysis
A novel class dependent feature selection method for cancer biomarker discovery

Computers in Biology and Medicine

Quantified Score

Hi-index	3.85

Visualization

Abstract

Motivation: Cancer diagnosis is one of the most important emerging clinical applications of gene expression microarray technology. We are seeking to develop a computer system for powerful and reliable cancer diagnostic model creation based on microarray data. To keep a realistic perspective on clinical applications we focus on multicategory diagnosis. To equip the system with the optimum combination of classifier, gene selection and cross-validation methods, we performed a systematic and comprehensive evaluation of several major algorithms for multicategory classification, several gene selection methods, multiple ensemble classifier methods and two cross-validation designs using 11 datasets spanning 74 diagnostic categories and 41 cancer types and 12 normal tissue types. Results: Multicategory support vector machines (MC-SVMs) are the most effective classifiers in performing accurate cancer diagnosis from gene expression data. The MC-SVM techniques by Crammer and Singer, Weston and Watkins and one-versus-rest were found to be the best methods in this domain. MC-SVMs outperform other popular machine learning algorithms, such as k-nearest neighbors, backpropagation and probabilistic neural networks, often to a remarkable degree. Gene selection techniques can significantly improve the classification performance of both MC-SVMs and other non-SVM learning algorithms. Ensemble classifiers do not generally improve performance of the best non-ensemble models. These results guided the construction of a software system GEMS (Gene Expression Model Selector) that automates high-quality model construction and enforces sound optimization and performance estimation procedures. This is the first such system to be informed by a rigorous comparative analysis of the available algorithms and datasets. Availability: The software system GEMS is available for download from http://www.gems-system.org for non-commercial use. Contact: alexander.statnikov@vanderbilt.edu