Gene selection by sequential search wrapper approaches in microarray cancer class prediction

Authors:
Iñ/aki Inza;Basilio Sierra;Rosa Blanco;Pedro Larrañ/aga
Affiliations:
Department of Computer Science and Artificial Intelligence, University of the Basque Country, P.O. Box 649, E-20080 Donostia-San Sebastiá/n, Basque Country, Spain. Tel.: +34 943015026/ Fax: +3 ...;Department of Computer Science and Artificial Intelligence, University of the Basque Country, P.O. Box 649, E-20080 Donostia-San Sebastiá/n, Basque Country, Spain. Tel.: +34 943015026/ Fax: +3 ...;Department of Computer Science and Artificial Intelligence, University of the Basque Country, P.O. Box 649, E-20080 Donostia-San Sebastiá/n, Basque Country, Spain. Tel.: +34 943015026/ Fax: +3 ...;Department of Computer Science and Artificial Intelligence, University of the Basque Country, P.O. Box 649, E-20080 Donostia-San Sebastiá/n, Basque Country, Spain. Tel.: +34 943015026/ Fax: +3 ...
Venue:
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Challenges for future intelligent systems in biomedicine
Year:
2002

Citing 0
Cited 13

Improving classification of microarray data using prototype-based feature selection

ACM SIGKDD Explorations Newsletter
Feature mining and pattern classification for steganalysis of LSB matching steganography in grayscale images

Pattern Recognition
Feature selection and classification model construction on type 2 diabetic patients' data

Artificial Intelligence in Medicine
Structural Risk Minimisation based gene expression profiling analysis

International Journal of Bioinformatics Research and Applications
Gene identification and survival prediction with Lp Cox regression and novel similarity measure

International Journal of Data Mining and Bioinformatics
Sparse Support Vector Machines with L_{p} Penalty for Biomarker Identification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Recursive Mahalanobis Separability Measure for Gene Subset Selection

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Rough set based maximum relevance-maximum significance criterion and Gene selection from microarray data

International Journal of Approximate Reasoning
Combining feature selection and feature construction to improve concept learning for high dimensional data

SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Selective gaussian naïve bayes model for diffuse large-b-cell lymphoma classification: some improvements in preprocessing and variable elimination

ECSQARU'05 Proceedings of the 8th European conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Combining bayesian networks, k nearest neighbours algorithm and attribute selection for gene expression data analysis

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Filter versus wrapper gene selection approaches in DNA microarray domains

Artificial Intelligence in Medicine
MP3 audio steganalysis

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last years, there has been a large growth in gene expression profiling technologies, which are expected to provide insight into cancer related cellular processes. Machine Learning algorithms, which are extensively applied in many areas of the real world, are not still popular in the Bioinformatics community. We report on the successful application of four well known supervised Machine Learning methods (IB1, Naive-Bayes, C4.5 and CN2) to cancer class prediction problems in three DNA microarray datasets of huge dimensionality (Colon, Leukemia and NCI-60). The essential gene selection process in microarray domains is performed by a sequential search engine, evaluating the goodness of each gene subset by a wrapper approach which executes, by a leave-one-out process, the supervised algorithm to obtain its accuracy estimation. By the use of the gene selection procedure, the accuracy of supervised algorithms is significantly improved and the number of genes of the classification models is notably reduced for all datasets.