Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data

Authors:
Li Teng;Laiwan Chan
Affiliations:
Department of Computer Science and Engineering, The Chinese University of Hongkong, Hong Kong, People's Republic of China;Department of Computer Science and Engineering, The Chinese University of Hongkong, Hong Kong, People's Republic of China
Venue:
Journal of Signal Processing Systems
Year:
2008

Citing 9
Cited 2

Clustering Algorithms

Clustering Algorithms
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Interrelated Two-way Clustering: An Unsupervised Approach for Gene Expression Data Analysis

BIBE '01 Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering
Enhanced Biclustering on Expression Data

BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
OP-Cluster: Clustering by Tendency in High Dimensional Space

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A new graph-theoretic approach to clustering and segmentation

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition

Iterated local search for biclustering of microarray data

PRIB'10 Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics
BiMine+: An efficient algorithm for discovering relevant biclusters of DNA microarray data

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a framework for biclustering gene expression profiles. This framework applies dominant set approach to create sets of sorting vectors for the sorting of the rows in the data matrix. In this way, the coexpressed rows of gene expression vectors could be gathered. We iteratively sort and transpose the gene expression data matrix to gather the blocks of coexpressed subset. Weighted correlation coefficient is used to measure the similarity in the gene level and the condition level. Their weights are updated each time using the sorting vector of the previous iteration. In this way, the highly correlated bicluster is located at one corner of the rearranged gene expression data matrix. We applied our approach to synthetic data and three real gene expression data sets with encouraging results. Secondly, we propose ACV (average correlation value) to evaluate the homogeneity of a bicluster or a data matrix. This criterion conforms to the intuitive biological notion of coexpressed set of genes or samples and is compared with the mean squared residue score. ACV is found to be more appropriate for both additive models and multiplicative models.