A Computational Approach to Gene Expression Data Extraction and Analysis

  • Authors:
  • Alan Wee-Chung Liew;Lap Keung Szeto;Sy-Sen Tang;Hong Yan;Mengsu Yang

  • Affiliations:
  • Department of Computer Engineering and Information Technology, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong;Department of Computer Engineering and Information Technology, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong;Department of Computer Engineering and Information Technology, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong;Department of Computer Engineering and Information Technology, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong&semi/ School of Electrical and Information Engineering, Universi ...;Department of Biology and Chemistry, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong

  • Venue:
  • Journal of VLSI Signal Processing Systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rapid advancement of DNA microarray technology has revolutionalized genetic research in bioscience. Due to the enormous amount of gene expression data generated by such technology, computer processing and analysis of such data has become indispensable. In this paper, we present a computational framework for the extraction, analysis and visualization of gene expression data from microarray experiments. A novel, fully automated, spot segmentation algorithm for DNA microarray images, which makes use of adaptive thresholding, morphological processing and statistical intensity modeling, is proposed to: (i) segment the blocks of spots, (ii) generate the grid structure, and (iii) to segment the spot within each subregion. For data analysis, we propose a binary hierarchical clustering (BHC) framework for the clustering of gene expression data. The BHC algorithm involves two major steps. Firstly, the fuzzy C-means algorithm and the average linkage hierarchical clustering algorithm are used to split the data into two classes. Secondly, the Fisher linear discriminant analysis is applied to the two classes to assess whether the split is acceptable. The BHC algorithm is applied to the sub-classes recursively and ends when all clusters cannot be split any further. BHC does not require the number of clusters to be known in advance. It does not place any assumption about the number of samples in each cluster or the class distribution. The hierarchical framework naturally leads to a tree structure representation for effective visualization of gene expressions.