Finding optimal classifiers for small feature sets in genomics and proteomics

  • Authors:
  • Gregor Stiglic;Juan J. Rodriguez;Peter Kokol

  • Affiliations:
  • University of Maribor, Faculty of Health Sciences, Zitna ulica 15, 2000 Maribor, Slovenia;University of Burgos, c/ Francisco de Vitoria s/n, 09006 Burgos, Spain;University of Maribor, Faculty of Health Sciences, Zitna ulica 15, 2000 Maribor, Slovenia and University of Maribor, Faculty of Electrical Engineering and Computer Science, Smetanova 17, 2000 Mari ...

  • Venue:
  • Neurocomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

The classification of genomic and proteomic data in extremely high dimensional datasets is a well-known problem which requires appropriate classification techniques. Classification methods are usually combined with gene selection techniques to provide optimal classification conditions-i.e. a lower dimensional classification environment. Another reason for reducing the dimensionality of such datasets is their interpretability, as it is much easier to interpret a small set of ranked genes than 20 thousand genes. This paper evaluates the classification performance of Rotation Forest classifier on small subsets of ranked genes for two dataset collections consisting of 47 genomic and proteomic classification problems. Robustness and high classification accuracy is shown to be an important feature of Rotation Forest when applied to small sets of genes.