Mining gene expression patterns for the discovery of overlapping clusters

Authors:
Patrick C. H. Ma;Keith C. C. Chan
Affiliations:
Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China;Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China
Venue:
EvoBIO'08 Proceedings of the 6th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
Year:
2008

Citing 9
Cited 0

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Self-organization and associative memory: 3rd edition

Self-organization and associative memory: 3rd edition
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Enhanced Biclustering on Expression Data

BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
A systematic comparison and evaluation of biclustering methods for gene expression data

Bioinformatics
A Practical Approach to Microarray Data Analysis

A Practical Approach to Microarray Data Analysis
A novel evolutionary data mining algorithm with applications to churn prediction

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many clustering algorithms have been used to identify co-expressed genes in gene expression data. Since proteins typically interact with different groups of proteins in order to serve different biological roles, when responding to different external stimulants, the genes that produce these proteins are expected to co-express with more than one group of genes and therefore belong to more than one cluster. This poses a challenge to existing clustering algorithms as there is a need for overlapping clusters to be discovered in a noisy environment. For this reason, we propose an effective clustering approach, which consists of an initial clustering phase and a second re-clustering phase, in this paper. The proposed approach has several desirable features as follows. It makes use of both local and global information inherent in gene expression data to discover overlapping clusters by computing both a local pairwise similarity measure between gene expression profiles and a global probabilistic measure of interestingness of hidden patterns. When performing re-clustering, the proposed approach is able to distinguish between relevant and irrelevant expression data. In addition, it is able to make explicit the patterns discovered in each cluster for easy interpretation. For performance evaluation, the proposed approach has been tested with both simulated and real expression data sets. Experimental results show that it is able to effectively uncover interesting patterns in noisy gene expression data so that, based on these patterns, overlapping clusters can be discovered and also the expression levels at which each cluster of genes co-expresses under different conditions can be better understood.