Software note: Rival penalized competitive learning (RPCL): a topology-determining algorithm for analyzing gene expression data

  • Authors:
  • T.Murlidharan Nair;Christina L. Zheng;J.Lynn Fink;Robert O. Stuart;Michael Gribskov

  • Affiliations:
  • San Diego Supercomputer Center, University of California at San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0537, USA;San Diego Supercomputer Center, University of California at San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0537, USA;San Diego Supercomputer Center, University of California at San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0537, USA;Department of Medicine and Pediatrics, Division of Nephrology-Hypertension, Cancer Center, University of California at San Diego, La Jolla, CA 92093, USA;San Diego Supercomputer Center, University of California at San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0537, USA

  • Venue:
  • Computational Biology and Chemistry
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

DNA arrays have become the immediate choice in the analysis of large-scale expression measurements. Understanding the expression pattern of genes provide functional information on newly identified genes by computational approaches. Gene expression pattern is an indicator of the state of the cell, and abnormal cellular states can be inferred by comparing expression profiles. Since co-regulated genes, and genes involved in a particular pathway, tend to show similar expression patterns, clustering expression patterns has become the natural method of choice to differentiate groups. However, most methods based on cluster analysis suffer from the usual problems (i) dead units, and (ii) the problem of determining the correct number of clusters (k) needed to classify the data. Selecting the k has been an open problem of pattern recognition and statistics for decades. Since clustering reveals similar patterns present in the data, fixing this number strongly influences the quality of the result. While there is no theoretical solution to this problem, the number of clusters can be decided by a heuristic clustering algorithm called rival penalized competitive learning (RPCL). We present a novel implementation of RPCL that transforms the correct number of clusters problem to the tractable problem of clustering based on the degree of similarity. This is biologically significant since our implementation clusters functionally co-regulated genes and genes that present similar patterns of expression. This new approach reveals potential genes that are co-involved in a biological process. This implementation of the RPCL algorithm is useful in differentiating groups involved in concerted functional regulation and helps to progressively home into patterns, which are closely similar.