Intelligent system for the analysis of microarray data using principal components and estimation of distribution algorithms

  • Authors:
  • C. Cano;F. García;F. J. López;A. Blanco

  • Affiliations:
  • Departamento de Ciencias de la Computación e I.A, E.T.S. Ingeniería Informática, University of Granada, 18071 Granada, Spain;Departamento de Ciencias de la Computación e I.A, E.T.S. Ingeniería Informática, University of Granada, 18071 Granada, Spain;Departamento de Ciencias de la Computación e I.A, E.T.S. Ingeniería Informática, University of Granada, 18071 Granada, Spain;Departamento de Ciencias de la Computación e I.A, E.T.S. Ingeniería Informática, University of Granada, 18071 Granada, Spain

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 12.06

Visualization

Abstract

Background: Microarray technology allows to measure the expression of thousands of genes simultaneously, and under tens of specific conditions. Clustering and Biclustering are the main tools to analyze gene expression data obtained from microarray experiments. By grouping together genes with the same behavior across samples, relevant biological knowledge may be extracted. Non-exclusive groupings are required, since a gene may play more than one biological role. Gene Shaving [Hastie, T., et al. (2000). Gene Shaving as a method for identifying distinct sets of genes with similar expression. Genome Biology, 1, 1-21] is a popular clustering algorithm which looks for coherent clusters of genes with high variance across samples, allowing overlapping among the clusters. Method: In this paper, we present an intelligent system for analyzing microarray data. Our system implements three novel non-exclusive approaches for clustering and biclustering whose aim is to find coherent groups of genes with large between-sample variance: EDA-Clustering and EDA-Biclustering, based on Estimation of Distribution Algorithms (EDA), and Gene-&-Sample Shaving, a biclustering algorithm based on Principal Components Analysis. Results: We integrated the three proposed methods into a web-based platform and tested their performance on two real datasets. The obtained results outperform Gene Shaving in terms of quality and size of revealed patterns. Furthermore, our system allows to visualize the results and validate them from a biological point of view by means of the annotations of the Gene Ontology.