A Biclustering Method to Discover Co-regulated Genes Using Diverse Gene Expression Datasets

Authors:
Doruk Bozdağ;Jeffrey D. Parvin;Umit V. Catalyurek
Affiliations:
Biomedical Informatics, The Ohio State University, and Electrical and Computer Engineering, The Ohio State University,;Biomedical Informatics, The Ohio State University,;Biomedical Informatics, The Ohio State University, and Electrical and Computer Engineering, The Ohio State University,
Venue:
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Year:
2009

Citing 8
Cited 1

Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Discovering local structure in gene expression data: the order-preserving submatrix problem

Proceedings of the sixth annual international conference on Computational biology
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Data

BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
OP-Cluster: Clustering by Tendency in High Dimensional Space

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Rich probabilistic models for genomic data

Rich probabilistic models for genomic data

Comparative analysis of biclustering algorithms

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a two-step biclustering approach to mine co-regulation patterns of a given reference gene to discover other genes that function in a common biological process. Currently, several successful methods utilize Pearson Correlation Coefficient (PCC) based gene expression analysis across all samples in datasets. However, microarray datasets are fraught with spurious samples or samples of diverse origin, and many genes/proteins that function in the same biological pathway may be missed. The novel PCC based biclustering algorithm introduced in this paper identifies subsets of genes with high correlation by stringently filtering the data and reducing false negatives due to spurious or unrelated samples in a dataset. Then, correlation information extracted from resulting biclusters are synthesized. We applied our method using the breast cancer associated tumor suppressors, BRCA1 and BRCA2, as the reference proteins to reveal genes and proteins important in the complex process of breast tumor formation. Experiments on 20 very large datasets showed that the top-ranked genes were remarkably enriched for genes that regulate the mitotic spindle and cytokinesis. The results imply that BRCA1 and BRCA2 proteins, which are considered to be DNA repair factors, have critical function regarding the mitotic spindle as well. Initial biological verification reveal that this identified factor function to control both centrosome dynamics, and also, surprisingly, DNA repair. Thus, this biclustering approach is successful at identifying proteins with highly related function from extremely complex datasets, and permits novel insights into gene function.