Multi-Metric and Multi-Substructure Biclustering Analysis for Gene Expression Data

Authors:
S. Y. Kung;Man-Wai Mak;Ilias Tagkopoulos
Affiliations:
Princeton University;Hong Kong Polytechnic University;Princeton University
Venue:
CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Year:
2005

Citing 3
Cited 3

Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Biometric authentication: a machine learning approach

Biometric authentication: a machine learning approach

Multi-objective evolutionary biclustering of gene expression data

Pattern Recognition
Stability and Performances in Biclustering Algorithms

Computational Intelligence Methods for Bioinformatics and Biostatistics
Possibilistic approach to biclustering: an application to oligonucleotide microarray data analysis

CMSB'06 Proceedings of the 2006 international conference on Computational Methods in Systems Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

A good number of biclustering algorithms have been proposed for grouping gene expression data. Many of them have adopted matrix norms to define the similarity score of a bicluster. We shall show that almost all matrix metrics can be converted into vector norms while preserving the rank equivalence. Vector norms provide a much more efficient vehicle for biclustering analysis and computation. The advantages are two folds: ease of analysis and saving of computation. Most existing biclustering algorithms have also implicitly assumed the use of univariate (i.e., single metric) evaluation for identifying biclusters. Such an approach however overlooks the fundamental principle that genes (even though they may belong to the same gene group) (1) may be subdivided into different substructures; and (2) they may be co-expressed via a diversity of coherence models (a gene may participate in multiple pathways that may or may not be co-active under all conditions). The former leads to the adoption of a multi-substurcture analysis, while the latter to the multivariate analysis. This paper will show that the proposed multivariate and multi-subscluster analysis is very effective in identifying and classifying biologically relevant groups in genes and conditions. For example, it has successfully yielded highly discriminant and accurate classification based on known ribosomal gene groups.