Classification in High-Dimensional Feature Spaces: Random Subsample Ensemble

  • Authors:
  • Gursel Serpen;Santhosh Pathical

  • Affiliations:
  • -;-

  • Venue:
  • ICMLA '09 Proceedings of the 2009 International Conference on Machine Learning and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents application of machine learning ensembles, which randomly project the original high dimensional feature space onto multiple lower dimensional feature subspaces, to classification problems with high-dimensional feature spaces. The motivation is to address challenges associated with algorithm scalability, data sparsity and information loss due to the so-called curse of dimensionality. The original high dimensional feature space is randomly projected onto a number of lower-dimensional feature subspaces. Each of these subspaces constitutes the domain of a classification subtask, and is associated with a base learner within an ensemble machine-learner context. Such an ensemble conceptualization is called as random subsample ensemble. Simulation results performed on data sets with up to 20,000 features indicate that the random subsample ensemble classifier performs comparably to other benchmark machine learners based on performance measures of prediction accuracy and cpu time. This finding establishes the feasibility of the ensemble and positions it to tackle classification problems with even much higher dimensional feature spaces.