Classification by ensembles from random partitions of high-dimensional data

Authors:
Hongshik Ahn;Hojin Moon;Melissa J. Fazzari;Noha Lim;James J. Chen;Ralph L. Kodell
Affiliations:
Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794-3600, USA;Division of Biometry and Risk Assessment, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA;Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794-3600, USA;Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794-3600, USA;Division of Biometry and Risk Assessment, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA;Department of Biostatistics, University of Arkansas for Medical Sciences, 4301 West Markham Street, Slot 781, Little Rock, AR 72205, USA
Venue:
Computational Statistics & Data Analysis
Year:
2007

Citing 10
Cited 8

The Strength of Weak Learnability

Machine Learning
The nature of statistical learning theory

The nature of statistical learning theory
Bagging predictors

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Random Forests

Machine Learning
Neural Network Ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence
Prediction error estimation: a comparison of resampling methods

Bioinformatics
Application of majority voting to pattern recognition: an analysis of its behavior and performance

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Ensemble methods for classification of patients for personalized medicine with high-dimensional data

Artificial Intelligence in Medicine
A decision support system to facilitate management of patients with acute gastrointestinal bleeding

Artificial Intelligence in Medicine
A model-free ensemble method for class prediction with application to biomedical decision making

Artificial Intelligence in Medicine
Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography

Computational Statistics & Data Analysis
A fuzzy random forest

International Journal of Approximate Reasoning
Selective voting in convex-hull ensembles improves classification accuracy

Artificial Intelligence in Medicine
Random subspace method and genetic algorithm applied to a LS-SVM ensemble

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Hybrid random subsample classifier ensemble for high dimensional data sets

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.03

Visualization

Abstract

A robust classification procedure is developed based on ensembles of classifiers, with each classifier constructed from a different set of predictors determined by a random partition of the entire set of predictors. The proposed methods combine the results of multiple classifiers to achieve a substantially improved prediction compared to the optimal single classifier. This approach is designed specifically for high-dimensional data sets for which a classifier is sought. By combining classifiers built from each subspace of the predictors, the proposed methods achieve a computational advantage in tackling the growing problem of dimensionality. For each subspace of the predictors, we build a classification tree or logistic regression tree. Our study shows, using four real data sets from different areas, that our methods perform consistently well compared to widely used classification methods. For unbalanced data, our approach maintains the balance between sensitivity and specificity more adequately than many other classification methods considered in this study.