Classification by ensembles from random partitions of high-dimensional data

  • Authors:
  • Hongshik Ahn;Hojin Moon;Melissa J. Fazzari;Noha Lim;James J. Chen;Ralph L. Kodell

  • Affiliations:
  • Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794-3600, USA;Division of Biometry and Risk Assessment, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA;Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794-3600, USA;Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794-3600, USA;Division of Biometry and Risk Assessment, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA;Department of Biostatistics, University of Arkansas for Medical Sciences, 4301 West Markham Street, Slot 781, Little Rock, AR 72205, USA

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2007

Quantified Score

Hi-index 0.03

Visualization

Abstract

A robust classification procedure is developed based on ensembles of classifiers, with each classifier constructed from a different set of predictors determined by a random partition of the entire set of predictors. The proposed methods combine the results of multiple classifiers to achieve a substantially improved prediction compared to the optimal single classifier. This approach is designed specifically for high-dimensional data sets for which a classifier is sought. By combining classifiers built from each subspace of the predictors, the proposed methods achieve a computational advantage in tackling the growing problem of dimensionality. For each subspace of the predictors, we build a classification tree or logistic regression tree. Our study shows, using four real data sets from different areas, that our methods perform consistently well compared to widely used classification methods. For unbalanced data, our approach maintains the balance between sensitivity and specificity more adequately than many other classification methods considered in this study.