The nature of statistical learning theory
The nature of statistical learning theory
Data Mining Using Dynamically Constructed Recurrent Fuzzy Neural Networks
PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
A comparison of methods for multiclass support vector machines
IEEE Transactions on Neural Networks
Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers
Computer Methods and Programs in Biomedicine
Perfect Population Classification on Hapmap Data with a Small Number of SNPs
Neural Information Processing
Neighborhood Rough Set Model Based Gene Selection for Multi-subtype Tumor Classification
ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues
Feature Selection and Classification for Small Gene Sets
PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
SEAL '08 Proceedings of the 7th International Conference on Simulated Evolution and Learning
Feature cluster selection for high-throughput data analysis
International Journal of Data Mining and Bioinformatics
Mining Combinatorial Effects on Quantitative Traits from Protein Expression Data
Proceedings of the 2008 conference on Knowledge-Based Software Engineering: Proceedings of the Eighth Joint Conference on Knowledge-Based Software Engineering
Expert Systems with Applications: An International Journal
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Artificial Intelligence in Medicine
Efficient gene selection with rough sets from gene expression data
RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
A novel method to robust tumor classification based on MACE filter
ICIC'09 Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications
A novel hybrid feature selection method for microarray data analysis
Applied Soft Computing
Gene selection and cancer classification: a rough sets based approach
Transactions on rough sets XII
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
Artificial Intelligence in Medicine
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A gene selection method for microarray data based on sampling
ICCCI'10 Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part II
International Journal of Data Mining and Bioinformatics
Hybrid genetic algorithm-neural network: Feature extraction for unpreprocessed microarray data
Artificial Intelligence in Medicine
Evolutionary tolerance-based gene selection in gene expression data
Transactions on rough sets XIV
A high-order feature synthesis and selection algorithm applied to insurance risk modelling
International Journal of Business Intelligence and Data Mining
Robust Classification Method of Tumor Subtype by Using Correlation Filters
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Acute leukemia classification by ensemble particle swarm model selection
Artificial Intelligence in Medicine
PAC learnability of rough hypercuboid classifier
ICIC'12 Proceedings of the 8th international conference on Intelligent Computing Theories and Applications
Gene expression classification using binary rule majority voting genetic programming classifier
International Journal of Advanced Intelligence Paradigms
A hybrid intelligent system for medical data classification
Expert Systems with Applications: An International Journal
MaskedPainter: Feature selection for microarray data analysis
Intelligent Data Analysis
Hi-index | 0.00 |
We aim at finding the smallest set of genes that can ensure highly accurate classification of cancers from microarray data by using supervised machine learning algorithms. The significance of finding the minimum gene subsets is three-fold: 1) It greatly reduces the computational burden and "noise” arising from irrelevant genes. In the examples studied in this paper, finding the minimum gene subsets even allows for extraction of simple diagnostic rules which lead to accurate diagnosis without the need for any classifiers. 2) It simplifies gene expression tests to include only a very small number of genes rather than thousands of genes, which can bring down the cost for cancer testing significantly. 3) It calls for further investigation into the possible biological relationship between these small numbers of genes and cancer development and treatment. Our simple yet very effective method involves two steps. In the first step, we choose some important genes using a feature importance ranking scheme. In the second step, we test the classification capability of all simple combinations of those important genes by using a good classifier. For three "small” and "simple” data sets with two, three, and four cancer (sub)types, our approach obtained very high accuracy with only two or three genes. For a "large” and "complex” data set with 14 cancer types, we divided the whole problem into a group of binary classification problems and applied the 2--step approach to each of these binary classification problems. Through this "divide-and-conquer” approach, we obtained accuracy comparable to previously reported results but with only 28 genes rather than 16,063 genes. In general, our method can significantly reduce the number of genes required for highly reliable diagnosis.