Comprehensive vertical sample-based KNN/LSVM classification for gene expression analysis

  • Authors:
  • Fei Pan;Baoying Wang;Xin Hu;William Perrizo

  • Affiliations:
  • Department of Computer Science, North Dakota State University, Fargo, ND;Department of Computer Science, North Dakota State University, Fargo, ND;Laboratory of Structural Microbiology, The Rockefeller University, New York, NY;Department of Computer Science, North Dakota State University, Fargo, ND

  • Venue:
  • Journal of Biomedical Informatics - Special issue: Biomedical machine learning
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classification analysis of microarray gene expression data has been widely used to uncover biological features and to distinguish closely related cell types that often appear in the diagnosis of cancer. However, the number of dimensions of gene expression data is often very high, e.g., in the hundreds or thousands. Accurate and efficient classification of such high-dimensional data remains a contemporary challenge. In this paper, we propose a comprehensive vertical sample-based KNN/LSVM classification approach with weights optimized by genetic algorithms for high-dimensional data. Experiments on common gene expression datasets demonstrated that our approach can achieve high accuracy and efficiency at the same time. The improvement of speed is mainly related to the vertical data representation, P-tree, and its optimized logical algebra. The high accuracy is due to the combination of a KNN majority voting approach and a local support vector machine approach that makes optimal decisions at the local level. As a result, our approach could be a powerful tool for high-dimensional gene expression data analysis.