Discriminatory mining of gene expression microarray data

Authors:
Zuyi Wang;Yue Wang;Jianping Lu;Sun-Yuan Kung;Junying Zhang;Richard Lee;Jianhua Xuan;Javed Khan;Robert Clarke
Affiliations:
Center for Genetic Research, Children's National Medical Center, Washington, DC and Department of Electrical Engineering and Computer Science, The Catholic University of America, Washington, DC;Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Alexandria, VA;Department of Electrical Engineering and Computer Science, The Catholic University of America, Washington, DC;Department of Electrical Engineering, Princeton University, Princeton, NJ;Institute of Electrical Engineering and Institute of Computer Science, Xidian University, Xi'an, P.R. China 710071 and Department of Electrical Engineering and Computer Science, The Catholic Unive ...;Lombardi Cancer Center, Georgetown University, Washington, DC;Department of Electrical Engineering and Computer Science, The Catholic University of America, Washington, DC;National Human Genome Research Institute, National Institutes of Health, Bethesda, MD;Lombardi Cancer Center, Georgetown University, Washington, DC
Venue:
Journal of VLSI Signal Processing Systems - Special issue on signal processing and neural networks for bioinformatics
Year:
2003

Citing 14
Cited 3

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Two-dimensional imaging

Two-dimensional imaging
Dimension reduction by local principal component analysis

Neural Computation
A Hierarchical Latent Variable Model for Data Visualization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mixtures of probabilistic principal component analyzers

Neural Computation
Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Independent component analysis: algorithms and applications

Neural Networks
Fractional-Step Dimensionality Reduction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Support vector clustering

The Journal of Machine Learning Research
Data mapping by probabilistic modular networks andinformation-theoretic criteria

IEEE Transactions on Signal Processing
Modeling the manifolds of images of handwritten digits

IEEE Transactions on Neural Networks
Probabilistic principal component subspaces: a hierarchical finite mixture model for data visualization

IEEE Transactions on Neural Networks

A high-performance VLSI architecture for the histogram peak-climbing data clustering algorithm

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
FreeViz-An intelligent multivariate visualization approach to explorative analysis of biomedical data

Journal of Biomedical Informatics
Incremental non-gaussian analysis of microarray gene expression data

Proceedings of the third international workshop on Data and text mining in bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent advances in machine learning and pattern recognition methods provide new analytical tools to explore high dimensional gene expression microarray data. Our data mining software, VISual Data Analyzer for cluster discovery (VISDA), reveals many distinguishing patterns among gene expression profiles, which are responsible for the cell's phenotypes. The model-supported exploration of high-dimensional data space is achieved through two complementary schemes: dimensionality reduction by discriminatory data projection and cluster decomposition by soft data clustering. Reducing dimensionality generates the visualization of the complete data set at the top level. This data set is then partitioned into subclusters that can consequently be visualized at lower levels and if necessary partitioned again. In this paper, three different algorithms are evaluated in their abilities to reduce dimensionality and to visualize data sets: Principal Component Analysis (PCA), Discriminatory Component Analysis (DCA), and Projection Pursuit Method (PPM). The partitioning into subclusters uses the Expectation-Maximization (EM) algorithm and the hierarchical normal mixture model that is selected by the user and verified "optimally" by the Minimum Description Length (MDL) criterion. These approaches produce different visualizations that are compared against known phenotypes from the microarray experiments. Overall, these algorithms and user-selected models explore the high dimensional data where standard analyses may not be sufficient.