Penalized principal component analysis of microarray data

  • Authors:
  • Vladimir Nikulin;Geoffrey J. McLachlan

  • Affiliations:
  • Department of Mathematics, University of Queensland;Department of Mathematics, University of Queensland

  • Venue:
  • CIBB'09 Proceedings of the 6th international conference on Computational intelligence methods for bioinformatics and biostatistics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The high dimensionality of microarray data, the expressions of thousands of genes in a much smaller number of samples, presents challenges that affect the validity of the analytical results. Hence attention has to be given to some form of dimension reduction to represent the data in terms of a smaller number of variables. The latter are often chosen to be a linear combinations of the original variables (genes) called metagenes. One commonly used approach is principal component analysis (PCA), which can be implemented via a singular value decomposition (SVD). However, in the case of a high-dimensional matrix, SVD may be very expensive in terms of computational time. We propose to reduce the SVD task to the ordinary maximisation problem with an Euclidean norm which may be solved easily using gradient-based optimisation. We demonstrate the effectiveness of this approach to the supervised classification of gene expression data.