Using Genetic Algorithms for Concept Learning
Machine Learning - Special issue on genetic algorithms
The nature of statistical learning theory
The nature of statistical learning theory
IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical and neural classifiers: an integrated approach to design
Statistical and neural classifiers: an integrated approach to design
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Machine Learning
Breeding Decision Trees Using Evolutionary Techniques
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
GA Tree: genetically evolved decision trees
ICTAI '00 Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence
A hybrid decision tree/genetic algorithm method for data mining
Information Sciences: an International Journal - Special issue: Soft computing data mining
Classification of microarrays to nearest centroids
Bioinformatics
Classification tree analysis using TARGET
Computational Statistics & Data Analysis
Computational Biology and Chemistry
Classification tree based protein structure distances for testing sequence-structure correlation
Computers in Biology and Medicine
Journal of Artificial Intelligence Research
Non-parametric classification of protein secondary structures
Computers in Biology and Medicine
A comparison of methods for multiclass support vector machines
IEEE Transactions on Neural Networks
Comparative evaluation of set-level techniques in microarray classification
ISBRA'11 Proceedings of the 7th international conference on Bioinformatics research and applications
A fuzzy intelligent approach to the classification problem in gene expression data analysis
Knowledge-Based Systems
Computer Methods and Programs in Biomedicine
Hi-index | 0.00 |
Classification into multiple classes when the measured variables are outnumbered is a major methodological challenge in -omics studies. Two algorithms that overcome the dimensionality problem are presented: the forest classification tree (FCT) and the forest support vector machines (FSVM). In FCT, a set of variables is randomly chosen and a classification tree (CT) is grown using a forward classification algorithm. The process is repeated and a forest of CTs is derived. Finally, the most frequent variables from the trees with the smallest apparent misclassification rate (AMR) are used to construct a productive tree. In FSVM, the CTs are replaced by SVMs. The methods are demonstrated using prostate gene expression data for classifying tissue samples into four tumor types. For threshold split value 0.001 and utilizing 100 markers the productive CT consisted of 29 terminal nodes and achieved perfect classification (AMR=0). When the threshold value was set to 0.01, a tree with 17 terminal nodes was constructed based on 15 markers (AMR=7%). In FSVM, reducing the fraction of the forest that was used to construct the best classifier from the top 80% to the top 20% reduced the misclassification to 25% (when using 200 markers). The proposed methodologies may be used for identifying important variables in high dimensional data. Furthermore, the FCT allows exploring the data structure and provides a decision rule.