Cluster Analysis for Gene Expression Data: A Survey
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.00 |
Data mining has become an important topic in effective analysis of gene expression data due to its wide application in the biomedical industry. A gene cluster is a set of two or more genes that serve to encode for the same or similar products. Gene clustering is the process of grouping related genes in the same cluster as at the foundation of different genomic studies that aim at analysing the function of genes. Several advanced techniques have been proposed for data clustering and many of them have been applied to gene expression data, with partial success. The goal of gene clustering is to identify important genes and perform cluster discovery on samples. This paper reviews three of the most representative off-line clustering techniques: fuzzy C-means clustering, hierarchical clustering, and mixed C-means clustering. These techniques are implemented and tested against a brain tumour gene expression dataset. The performance of the three techniques is compared based on 'goodness of clustering' evaluation measures and mixed C-means show best performance than the other two clustering techniques for the brain tumour gene expression data.