Learning and feature selection using the set covering machine with data-dependent rays on gene expression profiles

  • Authors:
  • Hans A. Kestler;Wolfgang Lindner;André Müller

  • Affiliations:
  • Neural Information Processing, University of Ulm, Ulm;Theoretical Computer Science, University of Ulm, Ulm, Germany;Internal Medicine I, University Hospital Ulm, Ulm

  • Venue:
  • ANNPR'06 Proceedings of the Second international conference on Artificial Neural Networks in Pattern Recognition
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Microarray technologies are increasingly being used in biological and medical sciences for high throughput analyses of genetic information on the genome, transcriptome and proteome levels. The differentiation between cancerous and benign processes in the body often poses a difficult diagnostic problem in the clinical setting while being of major importance for the treatment of patients. In this situation, feature reduction techniques capable of reducing the dimensionality of data are essential for building predictive tools based on classification. We extend the set covering machine of Marchand and Shawe-Taylor to data dependent rays in order to achieve a feature reduction and direct interpretation of the found conjunctions of intervals on individual genes. We give bounds for the generalization error as a function of the amount of data compression and the number of training errors achieved during training. In experiments with artificial data and a real world data set of gene expression profiles from the pancreas we show the utility of the approach and its applicability to microarray data classification.