Conquering the curse of dimensionality in gene expression cancer diagnosis: tough problem, simple models

  • Authors:
  • Minca Mramor;Gregor Leban;Janez Demšar;Blaž Zupan

  • Affiliations:
  • Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia;Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia;Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia;Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia

  • Venue:
  • AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the paper we study the properties of cancer gene expression data sets from the perspective of classification and tumor diagnosis. Our findings and case studies are based on several recently published data sets. We find that these data sets typically include a subset of about 100 highly discriminating features of which predictive power can be further enhanced by exploring their interactions. This finding speaks against often used univariate feature selection methods, and may explain the superior performance of support vector machines recently reported in the related work. We argue that a much simpler technique that directly finds visualizations with clear separation of diagnostic classes may be used instead. Furthermore, it may perform better in inference of an understandable classifier that includes only a few relevant features.