Clustering of diverse genomic data using information fusion

Authors:
Jyotsna Kasturi;Raj Acharya
Affiliations:
Department of Computer Science and Engineering, Pennsylvania State University University Park, PA 16802, USA;Department of Computer Science and Engineering, Pennsylvania State University University Park, PA 16802, USA
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 4

Techniques for clustering gene expression data

Computers in Biology and Medicine
Unsupervised Stability-Based Ensembles to Discover Reliable Structures in Complex Bio-molecular Data

Computational Intelligence Methods for Bioinformatics and Biostatistics
Data-Fusion in Clustering Microarray Data: Balancing Discovery and Interpretability

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Heterogeneous clustering ensemble method for combining different cluster results

BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Genome sequencing projects and high-through-put technologies like DNA and Protein arrays have resulted in a very large amount of information-rich data. Microarray experimental data are a valuable, but limited source for inferring gene regulation mechanisms on a genomic scale. Additional information such as promoter sequences of genes/DNA binding motifs, gene ontologies, and location data, when combined with gene expression analysis can increase the statistical significance of the finding. This paper introduces a machine learning approach to information fusion for combining heterogeneous genomic data. The algorithm uses an unsupervised joint learning mechanism that identifies clusters of genes using the combined data. Results: The correlation between gene expression time-series patterns obtained from different experimental conditions and the presence of several distinct and repeated motifs in their upstream sequences is examined here using publicly available yeast cell-cycle data. The results show that the combined learning approach taken here identifies correlated genes effectively. The algorithm provides an automated clustering method, but allows the user to specify apriori the influence of each data type on the final clustering using probabilities. Availability: Software code is available by request from the first author. Contact: jkasturi@cse.psu.edu