Microarray gene expression classification with few genes: Criteria to combine attribute selection and classification methods

Authors:
Carlos J. Alonso-González;Q. Isaac Moro-Sancho;Arancha Simon-Hurtado;Ricardo Varela-Arrabal
Affiliations:
Intelligent Systems Group (GSI), Departamento de Informática, Escuela Técnica Superior de Ingeniería Informática, Universidad de Valladolid, Campus Miguel Delibes s/n, 47011 Va ...;Intelligent Systems Group (GSI), Departamento de Informática, Escuela Técnica Superior de Ingeniería Informática, Universidad de Valladolid, Campus Miguel Delibes s/n, 47011 Va ...;Intelligent Systems Group (GSI), Departamento de Informática, Escuela Técnica Superior de Ingeniería Informática, Universidad de Valladolid, Campus Miguel Delibes s/n, 47011 Va ...;Intelligent Systems Group (GSI), Departamento de Informática, Escuela Técnica Superior de Ingeniería Informática, Universidad de Valladolid, Campus Miguel Delibes s/n, 47011 Va ...
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 12
Cited 1

A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Machine Learning

Machine Learning
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
An adaptation of Relief for attribute estimation in regression

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Inference for the Generalization Error

Machine Learning
A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Bioinformatics
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
A review of feature selection techniques in bioinformatics

Bioinformatics
An experimental comparison of performance measures for classification

Pattern Recognition Letters
Feature Selection and Classification for Small Gene Sets

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics

A new approach for a priori client threshold estimation in biometric signature recognition based on multiple linear regression

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III

Quantified Score

Hi-index	12.05

Visualization

Abstract

Microarray data classification is a task involving high dimensionality and small samples sizes. A common criterion to decide on the number of selected genes is maximizing the accuracy, which risks overfitting and usually selects more genes than actually needed. We propose, relaxing the maximum accuracy criterion, to select the combination of attribute selection and classification algorithm that using less attributes has an accuracy not statistically significantly worst that the best. Also we give some advice to choose a suitable combination of attribute selection and classifying algorithms for a good accuracy when using a low number of gene expressions. We used some well known attribute selection methods (FCBF, ReliefF and SVM-RFE, plus a Random selection, used as a base line technique) and classifying techniques (Naive Bayes, 3 Nearest Neighbor and SVM with linear kernel) applied to 30 data sets involving different cancer types.