MIB: Using mutual information for biclustering gene expression data

Authors:
Neelima Gupta;Seema Aggarwal
Affiliations:
Department of Computer Science, University of Delhi, Delhi 110 007, India;Department of Computer Science, University of Delhi, Delhi 110 007, India
Venue:
Pattern Recognition
Year:
2010

Citing 9
Cited 6

Self-organizing maps

Self-organizing maps
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding regulatory modules through large-scale gene-expression data analysis

Bioinformatics
BicAT: a biclustering analysis toolbox

Bioinformatics
A systematic comparison and evaluation of biclustering methods for gene expression data

Bioinformatics
Computing the maximum similarity bi-clusters of gene expression data

Bioinformatics
A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation

The Journal of Machine Learning Research
AmiGO

Bioinformatics

Noise-robust algorithm for identifying functionally associated biclusters from gene expression data

Information Sciences: an International Journal
Ensemble methods for biclustering tasks

Pattern Recognition
Finding gene coherent patterns using PATSUB+

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Biclustering of gene expression data based on related genes and conditions extraction

Pattern Recognition
BiETopti-BiClustering ensemble using optimization techniques

ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects
Mining order-preserving submatrices from probabilistic matrices

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Result of any biclustering or clustering algorithm depends on the choice of the similarity measure. Most of the biclustering algorithms are based on Euclidean distance or correlation coefficient. These measures capture only linear relationships between the genes but nonlinear dependencies may exist amongst them. In this paper we propose an approach using mutual information for biclustering gene expression data. Mutual information is a more general measure to investigate relationships (positive, negative correlation and nonlinear relationships as well). To the best of our knowledge, none of the existing algorithms for biclustering have used mutual information as a similarity measure between two genes. We obtained biclusters from the gene expression data of Arabidopsis thaliana and compared our biclusters with those obtained by two other algorithms namely ISA and BIMAX. Biological significance of the biclusters was checked using GO database. It was found that the genes belonging to our biclusters were significantly enriched with GO terms with better p values as compared to the genes of the biclusters obtained by the other two algorithms. To further investigate the biclusters, we studied the promoter regions of the genes belonging to a bicluster for common patterns/transcription factor binding sites (TFBS) or motifs. Promoter regions of the genes of most of our biclusters were found to have a common motif patterns which existed in the motif database of Arabidopsis thaliana. Also, the motifs extracted from our biclusters had better E values than those of others. Thus reconfirming that use of mutual information as a similarity measure will produce better biclusters.