An empirical comparison of dimensionality reduction methods for classifying gene and protein expression datasets

  • Authors:
  • George Lee;Carlos Rodriguez;Anant Madabhushi

  • Affiliations:
  • Rutgers, The State University of New Jersey, Department of Biomedical Engineering, Piscataway, NJ;University of Puerto Rico, Mayagez, PR;Rutgers, The State University of New Jersey, Department of Biomedical Engineering, Piscataway, NJ

  • Venue:
  • ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The recent explosion in availability of gene and protein expression data for cancer detection has necessitated the development of sophisticated machine learning tools for high dimensional data analysis. Previous attempts at gene expression analysis have typically used a linear dimensionality reduction method such as Principal Components Analysis (PCA). Linear dimensionality reduction methods do not however account for the inherent nonlinearity within the data. The motivation behind this work is to demonstrate that nonlinear dimensionality reduction methods are more adept at capturing the nonlinearity within the data compared to linear methods, and hence would result in better classification and potentially aid in the visualization and identification of new data classes. Consequently, in this paper, we empirically compare the performance of 3 commonly used linear versus 3 nonlinear dimensionality reduction techniques from the perspective of (a) distinguishing objects belonging to cancer and non-cancer classes and (b) new class discovery in high dimensional gene and protein expression studies for different types of cancer. Quantitative evaluation using a support vector machine and a decision tree classifier revealed statistically significant improvement in classification accuracy by using nonlinear dimensionality reduction methods compared to linear methods.