Multidimensional support vector machines for visualization of gene expression data

  • Authors:
  • D. Komura;H. Nakamura;S. Tsutsumi;H. Aburatani;S. Ihara

  • Affiliations:
  • Research Center for Advanced Science and Technology, University of Tokyo Tokyo 153-8904, Japan;Research Center for Advanced Science and Technology, University of Tokyo Tokyo 153-8904, Japan;Research Center for Advanced Science and Technology, University of Tokyo Tokyo 153-8904, Japan;Genome Science Division, Center for Collaborative Research, University of Tokyo Tokyo 153-8904, Japan;Research Center for Advanced Science and Technology, University of Tokyo Tokyo 153-8904, Japan

  • Venue:
  • Bioinformatics
  • Year:
  • 2005

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Since DNA microarray experiments provide us with huge amount of gene expression data, they should be analyzed with statistical methods to extract the meanings of experimental results. Some dimensionality reduction methods such as Principal Component Analysis (PCA) are used to roughly visualize the distribution of high dimensional gene expression data. However, in the case of binary classification of gene expression data, PCA does not utilize class information when choosing axes. Thus clearly separable data in the original space may not be so in the reduced space used in PCA. Results: For visualization and class prediction of gene expression data, we have developed a new SVM-based method called multidimensional SVMs, that generate multiple orthogonal axes. This method projects high dimensional data into lower dimensional space to exhibit properties of the data clearly and to visualize a distribution of the data roughly. Furthermore, the multiple axes can be used for class prediction. The basic properties of conventional SVMs are retained in our method: solutions of mathematical programming are sparse, and nonlinear classification is implemented implicitly through the use of kernel functions. The application of our method to the experimentally obtained gene expression datasets for patients' samples indicates that our algorithm is efficient and useful for visualization and class prediction. Contact: komura@hal.rcast.u-tokyo.ac.jp