Enhanced classification for high-throughput data with an optimal projection and hybrid classifier

  • Authors:
  • Joon Jin Song;Jingying Zhang

  • Affiliations:
  • Center for Statistical Research and Consulting, Department of Mathematical Sciences, University of Arkansas, Fayetteville, AR 72701, USA;Department of Mathematical Sciences, University of Arkansas, Fayetteville, AR 72701, USA

  • Venue:
  • International Journal of Data Mining and Bioinformatics
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

High-throughput screening technologies recently developed allow scientists to conduct millions of biological and medical tests simultaneously and rapidly. A major bottleneck for the analysis is to reduce the inherent high dimensionality for subsequent analysis. Principal Component Analysis PCA is a popular tool for dimensionality reduction by selecting typically a few Principal Components PCs ranked by their variances, eigenvalues. Since this selection approach is not always effective in reducing dimensionality, we consider a different ranking criterion, the canonical variate criterion. To further enhance the classification performance, we propose an integrated classification framework to combine the criterion and two hybrid classification methods and compare with several popular classification methods using leave-one-out cross-validation. For illustration, three real high-throughput data sets are considered and analysed to illustrate the methods.