Accurate Cancer Classification Using Expressions of Very Few Genes

Authors:
Lipo Wang;Feng Chu;Wei Xie
Affiliations:
-;-;-
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2007

Citing 7
Cited 30

The nature of statistical learning theory

The nature of statistical learning theory
Data Mining Using Dynamically Constructed Recurrent Fuzzy Neural Networks

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
A statistical method for identifying differential gene--gene co-expression patterns

Bioinformatics
A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Bioinformatics
A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments

Bioinformatics
Simple decision rules for classifying human cancers from gene expression profiles

Bioinformatics
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks

Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers

Computer Methods and Programs in Biomedicine
Perfect Population Classification on Hapmap Data with a Small Number of SNPs

Neural Information Processing
Neighborhood Rough Set Model Based Gene Selection for Multi-subtype Tumor Classification

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues
Feature Selection and Classification for Small Gene Sets

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Genetic Algorithm Based Methods for Identification of Health Risk Factors Aimed at Preventing Metabolic Syndrome

SEAL '08 Proceedings of the 7th International Conference on Simulated Evolution and Learning
Feature cluster selection for high-throughput data analysis

International Journal of Data Mining and Bioinformatics
Mining Combinatorial Effects on Quantitative Traits from Protein Expression Data

Proceedings of the 2008 conference on Knowledge-Based Software Engineering: Proceedings of the Eighth Joint Conference on Knowledge-Based Software Engineering
A Probabilistic mechanism based on clustering analysis and distance measure for subset gene selection

Expert Systems with Applications: An International Journal
Improved wavelet neural network for early diagnosis of cancer patients using microarray gene expression data

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps

Artificial Intelligence in Medicine
Efficient gene selection with rough sets from gene expression data

RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
A novel method to robust tumor classification based on MACE filter

ICIC'09 Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications
Finding optimal classifiers for small feature sets in genomics and proteomics

Neurocomputing
Cohesion: A concept and framework for confident association discovery with potential application in microarray mining

Applied Soft Computing
A novel hybrid feature selection method for microarray data analysis

Applied Soft Computing
Gene selection and cancer classification: a rough sets based approach

Transactions on rough sets XII
Prediction of combinatorial protein-protein interaction networks from expression data using statistics on conditional probability

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
Multi-step dimensionality reduction and semi-supervised graph-based tumor classification using gene expression data

Artificial Intelligence in Medicine
ICGA-PSO-ELM Approach for Accurate Multiclass Cancer Classification Resulting in Reduced Gene Sets in Which Genes Encoding Secreted Proteins Are Highly Represented

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A gene selection method for microarray data based on sampling

ICCCI'10 Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part II
Combining multiple perspective as intelligent agents into robust approach for biomarker detection in gene expression data

International Journal of Data Mining and Bioinformatics
Hybrid genetic algorithm-neural network: Feature extraction for unpreprocessed microarray data

Artificial Intelligence in Medicine
Evolutionary tolerance-based gene selection in gene expression data

Transactions on rough sets XIV
A high-order feature synthesis and selection algorithm applied to insurance risk modelling

International Journal of Business Intelligence and Data Mining
Robust Classification Method of Tumor Subtype by Using Correlation Filters

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Acute leukemia classification by ensemble particle swarm model selection

Artificial Intelligence in Medicine
PAC learnability of rough hypercuboid classifier

ICIC'12 Proceedings of the 8th international conference on Intelligent Computing Theories and Applications
Gene expression classification using binary rule majority voting genetic programming classifier

International Journal of Advanced Intelligence Paradigms
A hybrid intelligent system for medical data classification

Expert Systems with Applications: An International Journal
MaskedPainter: Feature selection for microarray data analysis

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We aim at finding the smallest set of genes that can ensure highly accurate classification of cancers from microarray data by using supervised machine learning algorithms. The significance of finding the minimum gene subsets is three-fold: 1) It greatly reduces the computational burden and "noise” arising from irrelevant genes. In the examples studied in this paper, finding the minimum gene subsets even allows for extraction of simple diagnostic rules which lead to accurate diagnosis without the need for any classifiers. 2) It simplifies gene expression tests to include only a very small number of genes rather than thousands of genes, which can bring down the cost for cancer testing significantly. 3) It calls for further investigation into the possible biological relationship between these small numbers of genes and cancer development and treatment. Our simple yet very effective method involves two steps. In the first step, we choose some important genes using a feature importance ranking scheme. In the second step, we test the classification capability of all simple combinations of those important genes by using a good classifier. For three "small” and "simple” data sets with two, three, and four cancer (sub)types, our approach obtained very high accuracy with only two or three genes. For a "large” and "complex” data set with 14 cancer types, we divided the whole problem into a group of binary classification problems and applied the 2--step approach to each of these binary classification problems. Through this "divide-and-conquer” approach, we obtained accuracy comparable to previously reported results but with only 28 genes rather than 16,063 genes. In general, our method can significantly reduce the number of genes required for highly reliable diagnosis.