A distributed, parallel system for large-scale structure recognition in gene expression data

Authors:
Jens Ernst
Affiliations:
Lehrstuhl für Effiziente Algorithmen, Institut für Informatik, Technische Universität, München
Venue:
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Year:
2006

Citing 2
Cited 0

Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the development of very high-throughput lab technology, known as DNA microarrays, it has become feasible for scientists to monitor the transcriptional activity of all known genes in many living organisms. Such assays are typically conducted repeatedly, along a timecourse or across a series of predefined experimental conditions, yielding a set of expression profiles. Arranging these into subsets, based on their pair-wise similarity, is known as clustering. Clusters of genes exhibiting similar expression behavior are often related in a biologically meaningful way, which is at the center of interest to research in functional genomics. We present a distributed, parallel system based on spectral graph theory and numerical linear algebra that can solve this problem for datasets generated by the latest generation of microarrays, and at high levels of experimental noise. It allows us to process hundreds of thousands of expression profiles, thereby vastly increasing the current size limit for unsupervized clustering with full similarity information.