A Weighted Principal Component Analysis and Its Application to Gene Expression Data

Authors:
Joaquim F. Pinto da Costa;Hugo Alonso;Luis Roque
Affiliations:
University of Porto, Porto;University of Aveiro, Aveiro;ISEP, Porto
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2011

Citing 12
Cited 1

Class prediction and discovery using gene expression data

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Robust PCA and classification in biosciences

Bioinformatics
Biologically valid linear factor models of gene expression

Bioinformatics
Analysis of longitudinal metabolomics data

Bioinformatics
Metabolite fingerprinting: detecting biological features by independent component analysis

Bioinformatics
Accounting for probe-level noise in principal component analysis of microarray data

Bioinformatics
Classification of microarrays to nearest centroids

Bioinformatics
Inference of transcriptional regulatory network by two-stage constrained space factor analysis

Bioinformatics
Empirical analysis of predictive algorithms for collaborative filtering

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

RPCA: a novel preprocessing method for PCA

Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.