VizCluster and its Application on Classifying Gene Expression Data

  • Authors:
  • Li Zhang;Chun Tang;Yuqing Song;Aidong Zhang;Murali Ramanathan

  • Affiliations:
  • Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY 14260, USA. lizhang@cse.buffalo.edu;Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY 14260, USA. chuntang@cse.buffalo.edu;Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY 14260, USA. ys2@cse.buffalo.edu;Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY 14260, USA. azhang@cse.buffalo.edu;Department of Pharmaceutical Sciences, State University of New York at Buffalo, Buffalo, NY 14260, USA. murali@acsu.buffalo.edu

  • Venue:
  • Distributed and Parallel Databases
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Visualization enables us to find structures, features, patterns, and relationships in a dataset by presenting the data in various graphical forms with possible interactions. A visualization can provide a qualitative overview of large and complex datasets, can summarize data, and can assist in identifying regions of interest and appropriate parameters focused on quantitative analysis. Recently, DNA microarray technology provides a broad snapshot of the state of the cell, by measuring the expression levels of thousands of genes simultaneously. Such information can thus be used to analyze different samples by gene expression profiles. It has already had a significant impact on the field of bioinformatics, requiring innovative techniques to efficiently and effectively extract, analyze, and visualize these fast growing data.In this paper, we present a dynamic interactive visualization environment, VizCluster, and its application on classifying gene expression data. VizCluster takes advantage of graphical visualization methods to reveal underlining data patterns. It combines the merits of both high dimensional projection scatter-plot and parallel coordinate plot. In its core lies a nonlinear projection which maps the n-dimensional vectors onto two-dimensional points. To preserve the information at different scales and yet reduce the typical problem of parallel coordinate plots being messy caused by overlapping lines, a zip zooming viewing method is proposed. Integrated with other features, VizCluster is developed to give a simple, fast, intuitive, and yet powerful view of the data set. Its primary applications are on the classification of samples and evaluation of gene clusters for microarray datasets. Three gene expression datasets are used to illustrate the approach. We demonstrate that VizCluster approach is promising to be used for analyzing and visualizing microarray data sets and further development is worthwhile.