Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data

Authors:
Elias Zintzaras;Axel Kowald
Affiliations:
Department of Biomathematics, University of Thessaly School of Medicine, Larissa, Greece and Department of Informatics, University of Piraeus, Piraeus, Greece and Institute for Clinical Research a ...;Department of Biomathematics, University of Thessaly School of Medicine, Larissa, Greece and Department of Theoretical Biophysics, Humboldt University of Berlin, Berlin, Germany
Venue:
Computers in Biology and Medicine
Year:
2010

Citing 19
Cited 3

Using Genetic Algorithms for Concept Learning

Machine Learning - Special issue on genetic algorithms
The nature of statistical learning theory

The nature of statistical learning theory
On Dimensionality, Sample Size, and Classification Error of Nonparametric Linear Classification Algorithms

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical and neural classifiers: an integrated approach to design

Statistical and neural classifiers: an integrated approach to design
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Random Forests

Machine Learning
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Breeding Decision Trees Using Evolutionary Techniques

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
GA Tree: genetically evolved decision trees

ICTAI '00 Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence
A hybrid decision tree/genetic algorithm method for data mining

Information Sciences: an International Journal - Special issue: Soft computing data mining
Is cross-validation valid for small-sample microarray classification?

Bioinformatics
Classification of microarrays to nearest centroids

Bioinformatics
A tree-based decision rule for identifying profile groups of cases without predefined classes: application in diffuse large B-cell lymphomas

Computers in Biology and Medicine
Classification tree analysis using TARGET

Computational Statistics & Data Analysis
Meta-analysis for ranked discovery datasets: Theoretical framework and empirical demonstration for microarrays

Computational Biology and Chemistry
Classification tree based protein structure distances for testing sequence-structure correlation

Computers in Biology and Medicine
Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm

Journal of Artificial Intelligence Research
Non-parametric classification of protein secondary structures

Computers in Biology and Medicine
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks

Comparative evaluation of set-level techniques in microarray classification

ISBRA'11 Proceedings of the 7th international conference on Bioinformatics research and applications
A fuzzy intelligent approach to the classification problem in gene expression data analysis

Knowledge-Based Systems
METRADISC-XL: A program for meta-analysis of multidimensional ranked discovery oriented datasets including microarrays

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification into multiple classes when the measured variables are outnumbered is a major methodological challenge in -omics studies. Two algorithms that overcome the dimensionality problem are presented: the forest classification tree (FCT) and the forest support vector machines (FSVM). In FCT, a set of variables is randomly chosen and a classification tree (CT) is grown using a forward classification algorithm. The process is repeated and a forest of CTs is derived. Finally, the most frequent variables from the trees with the smallest apparent misclassification rate (AMR) are used to construct a productive tree. In FSVM, the CTs are replaced by SVMs. The methods are demonstrated using prostate gene expression data for classifying tissue samples into four tumor types. For threshold split value 0.001 and utilizing 100 markers the productive CT consisted of 29 terminal nodes and achieved perfect classification (AMR=0). When the threshold value was set to 0.01, a tree with 17 terminal nodes was constructed based on 15 markers (AMR=7%). In FSVM, reducing the fraction of the forest that was used to construct the best classifier from the top 80% to the top 20% reduced the misclassification to 25% (when using 200 markers). The proposed methodologies may be used for identifying important variables in high dimensional data. Furthermore, the FCT allows exploring the data structure and provides a decision rule.