Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
Journal of Computational and Applied Mathematics
Self-organization and associative memory: 3rd edition
Self-organization and associative memory: 3rd edition
Pattern Recognition with Fuzzy Objective Function Algorithms
Pattern Recognition with Fuzzy Objective Function Algorithms
Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data
IEEE Transactions on Pattern Analysis and Machine Intelligence
Enhanced Biclustering on Expression Data
BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
A Practical Approach to Microarray Data Analysis
A Practical Approach to Microarray Data Analysis
A novel evolutionary data mining algorithm with applications to churn prediction
IEEE Transactions on Evolutionary Computation
Hi-index | 0.00 |
Many clustering algorithms have been used to identify co-expressed genes in gene expression data. Since proteins typically interact with different groups of proteins in order to serve different biological roles, when responding to different external stimulants, the genes that produce these proteins are expected to co-express with more than one group of genes and therefore belong to more than one cluster. This poses a challenge to existing clustering algorithms as there is a need for overlapping clusters to be discovered in a noisy environment. For this reason, we propose an effective clustering approach, which consists of an initial clustering phase and a second re-clustering phase, in this paper. The proposed approach has several desirable features as follows. It makes use of both local and global information inherent in gene expression data to discover overlapping clusters by computing both a local pairwise similarity measure between gene expression profiles and a global probabilistic measure of interestingness of hidden patterns. When performing re-clustering, the proposed approach is able to distinguish between relevant and irrelevant expression data. In addition, it is able to make explicit the patterns discovered in each cluster for easy interpretation. For performance evaluation, the proposed approach has been tested with both simulated and real expression data sets. Experimental results show that it is able to effectively uncover interesting patterns in noisy gene expression data so that, based on these patterns, overlapping clusters can be discovered and also the expression levels at which each cluster of genes co-expresses under different conditions can be better understood.