Boosting Threshold Classifiers for High--- Dimensional Data in Functional Genomics

  • Authors:
  • Ludwig Lausser;Malte Buchholz;Hans A. Kestler

  • Affiliations:
  • Department of Internal Medicine I, University Hospital Ulm, Germany;Internal Medicine, SP Gastroenterology, University Hospital Marburg, Germany;Department of Internal Medicine I, University Hospital Ulm, Germany and Institute of Neural Information Processing, University of Ulm, Germany

  • Venue:
  • ANNPR '08 Proceedings of the 3rd IAPR workshop on Artificial Neural Networks in Pattern Recognition
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Diagnosis of disease based on the classification of DNA microarray gene expression profiles of clinical samples is a promising novel approach to improve the performance and accuracy of current routine diagnostic procedures. In many applications ensembles outperform single classifiers. In a clinical setting a combination of simple classification rules, such as single threshold classifiers on individual gene expression values, may provide valuable insights and facilitate the diagnostic process. A boosting algorithm can be used for building such decision rules by utilizing single threshold classifiers as base classifiers. AdaBoost can be seen as the predecessor of many boosting algorithms developed, unfortunately its performance degrades on high-dimensional data. Here we compare extensions of AdaBoost namely MultiBoost, MadaBoost and AdaBoost-VC in cross-validation experiments on noisy high-dimensional artifical and real data sets. The artifical data sets are so constructed, that features, which are relevant for the class distinction, can easily be read out. Our special interest is in the features the ensembles select for classification and how many of them are effectively related to the original class distinction.