Gene selection by sequential search wrapper approaches in microarray cancer class prediction

  • Authors:
  • Iñ/aki Inza;Basilio Sierra;Rosa Blanco;Pedro Larrañ/aga

  • Affiliations:
  • Department of Computer Science and Artificial Intelligence, University of the Basque Country, P.O. Box 649, E-20080 Donostia-San Sebastiá/n, Basque Country, Spain. Tel.: +34 943015026/ Fax: +3 ...;Department of Computer Science and Artificial Intelligence, University of the Basque Country, P.O. Box 649, E-20080 Donostia-San Sebastiá/n, Basque Country, Spain. Tel.: +34 943015026/ Fax: +3 ...;Department of Computer Science and Artificial Intelligence, University of the Basque Country, P.O. Box 649, E-20080 Donostia-San Sebastiá/n, Basque Country, Spain. Tel.: +34 943015026/ Fax: +3 ...;Department of Computer Science and Artificial Intelligence, University of the Basque Country, P.O. Box 649, E-20080 Donostia-San Sebastiá/n, Basque Country, Spain. Tel.: +34 943015026/ Fax: +3 ...

  • Venue:
  • Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Challenges for future intelligent systems in biomedicine
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the last years, there has been a large growth in gene expression profiling technologies, which are expected to provide insight into cancer related cellular processes. Machine Learning algorithms, which are extensively applied in many areas of the real world, are not still popular in the Bioinformatics community. We report on the successful application of four well known supervised Machine Learning methods (IB1, Naive-Bayes, C4.5 and CN2) to cancer class prediction problems in three DNA microarray datasets of huge dimensionality (Colon, Leukemia and NCI-60). The essential gene selection process in microarray domains is performed by a sequential search engine, evaluating the goodness of each gene subset by a wrapper approach which executes, by a leave-one-out process, the supervised algorithm to obtain its accuracy estimation. By the use of the gene selection procedure, the accuracy of supervised algorithms is significantly improved and the number of genes of the classification models is notably reduced for all datasets.