Feature Selection and Classification for Small Gene Sets

  • Authors:
  • Gregor Stiglic;Juan J. Rodriguez;Peter Kokol

  • Affiliations:
  • Faculty of Health Sciences, University of Maribor, Maribor, Slovenia 2000 and Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia 2000;University of Burgos, Burgos, Spain 09006;Faculty of Health Sciences, University of Maribor, Maribor, Slovenia 2000 and Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia 2000

  • Venue:
  • PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Random Forests, Support Vector Machines and k-Nearest Neighbors are successful and proven classification techniques that are widely used for different kinds of classification problems. One of them is classification of genomic and proteomic data that is known as a problem with extremely high dimensionality and therefore demands suited classification techniques. In this domain they are usually combined with gene selection techniques to provide optimal classification accuracy rates. Another reason for reducing the dimensionality of such datasets is their interpretability. It is much easier to interpret a small set of ranked genes than 20 or 30 thousands of unordered genes. In this paper we present a classification ensemble of decision trees called Rotation Forest and evaluate its classification performance on small subsets of ranked genes for 14 genomic and proteomic classification problems. An important feature of Rotation Forest is demonstrated --- i.e. robustness and high classification accuracy using small sets of genes.