Hybrid random subsample classifier ensemble for high dimensional data sets

  • Authors:
  • Santhosh Pathical;Gursel Serpen

  • Affiliations:
  • Electrical Engineering and Computer Science Department, University of Toledo, Toledo, OH, USA;Electrical Engineering and Computer Science Department, University of Toledo, Toledo, OH, USA

  • Venue:
  • International Journal of Hybrid Intelligent Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a comparative performance evaluation of a random subsample classifier ensemble with leading machine learning classifiers on high dimensional datasets. Classification performance of the hybrid random subsample ensemble is compared to those of a comprehensive set of machine learning classification algorithms through both in-house simulations and the results published by others in the literature. Performance comparison is based on prediction accuracies on six datasets from the UCI Machine Learning repository, namely Dexter, Madelon, Isolet, Multiple Features, Internet Ads, and Citeseer, with feature counts of up to 105,000. Simulation results establish the competitive performance aspect of the hybrid random subsample ensemble for high dimensional datasets. Specifically, the study findings indicate that hybrid random subsample ensembles with a subsample rate of 15% and base classifier count of 25 or more can achieve a very competitive performance on high dimensional data sets when compared to leading machine learning classifier algorithms.