Improving cluster visualization in self-organizing maps: Application in gene expression data analysis

  • Authors:
  • Elmer A. Fernandez;Monica Balzarini

  • Affiliations:
  • Faculty of Engineering, Catholic University of Córdoba, Córdoba, Camino Alta Gracia Km 10, Cordoba, Argentina and National Council of Scientific and Technological Research (CONICET), Arg ...;National Council of Scientific and Technological Research (CONICET), Argentina and Statistics and Biometry, National University of Córdoba, Córdoba, Argentina

  • Venue:
  • Computers in Biology and Medicine
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cluster analysis is one of the crucial steps in gene expression pattern (GEP) analysis. It leads to the discovery or identification of temporal patterns and coexpressed genes. GEP analysis involves highly dimensional multivariate data which demand appropriate tools. A good alternative for grouping many multidimensional objects is self-organizing maps (SOM), an unsupervised neural network algorithm able to find relationships among data. SOM groups and maps them topologically. However, it may be difficult to identify clusters with the usual visualization tools for SOM. We propose a simple algorithm to identify and visualize clusters in SOM (the RP-Q method). The RP is a new node-adaptive attribute that moves in a two dimensional virtual space imitating the movement of the codebooks vectors of the SOM net into the input space. The Q statistic evaluates the SOM structure providing an estimation of the number of clusters underlying the data set. The SOM-RP-Q algorithm permits the visualization of clusters in the SOM and their node patterns. The algorithm was evaluated in several simulated and real GEP data sets. Results show that the proposed algorithm successfully displays the underlying cluster structure directly from the SOM and is robust to different net sizes.