Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data

  • Authors:
  • Elias Zintzaras;Axel Kowald

  • Affiliations:
  • Department of Biomathematics, University of Thessaly School of Medicine, Larissa, Greece and Department of Informatics, University of Piraeus, Piraeus, Greece and Institute for Clinical Research a ...;Department of Biomathematics, University of Thessaly School of Medicine, Larissa, Greece and Department of Theoretical Biophysics, Humboldt University of Berlin, Berlin, Germany

  • Venue:
  • Computers in Biology and Medicine
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classification into multiple classes when the measured variables are outnumbered is a major methodological challenge in -omics studies. Two algorithms that overcome the dimensionality problem are presented: the forest classification tree (FCT) and the forest support vector machines (FSVM). In FCT, a set of variables is randomly chosen and a classification tree (CT) is grown using a forward classification algorithm. The process is repeated and a forest of CTs is derived. Finally, the most frequent variables from the trees with the smallest apparent misclassification rate (AMR) are used to construct a productive tree. In FSVM, the CTs are replaced by SVMs. The methods are demonstrated using prostate gene expression data for classifying tissue samples into four tumor types. For threshold split value 0.001 and utilizing 100 markers the productive CT consisted of 29 terminal nodes and achieved perfect classification (AMR=0). When the threshold value was set to 0.01, a tree with 17 terminal nodes was constructed based on 15 markers (AMR=7%). In FSVM, reducing the fraction of the forest that was used to construct the best classifier from the top 80% to the top 20% reduced the misclassification to 25% (when using 200 markers). The proposed methodologies may be used for identifying important variables in high dimensional data. Furthermore, the FCT allows exploring the data structure and provides a decision rule.