Non-linear PCA: a missing data approach

  • Authors:
  • Matthias Scholz;Fatma Kaplan;Charles L. Guy;Joachim Kopka;Joachim Selbig

  • Affiliations:
  • Max Planck Institute of Molecular Plant Physiology Potsdam, Germany;University of Florida, Plant Molecular and Cellular Biology Program, Department of Environmental Horticulture Gainesville, Florida 32611, USA;University of Florida, Plant Molecular and Cellular Biology Program, Department of Environmental Horticulture Gainesville, Florida 32611, USA;Max Planck Institute of Molecular Plant Physiology Potsdam, Germany;Max Planck Institute of Molecular Plant Physiology Potsdam, Germany

  • Venue:
  • Bioinformatics
  • Year:
  • 2005

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Visualizing and analysing the potential non-linear structure of a dataset is becoming an important task in molecular biology. This is even more challenging when the data have missing values. Results: Here, we propose an inverse model that performs non-linear principal component analysis (NLPCA) from incomplete datasets. Missing values are ignored while optimizing the model, but can be estimated afterwards. Results are shown for both artificial and experimental datasets. In contrast to linear methods, non-linear methods were able to give better missing value estimations for non-linear structured data. Application: We applied this technique to a time course of metabolite data from a cold stress experiment on the model plant Arabidopsis thaliana, and could approximate the mapping function from any time point to the metabolite responses. Thus, the inverse NLPCA provides greatly improved information for better understanding the complex response to cold stress. Contact: scholz@mpimp-golm.mpg.de