CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Techniques of Cluster Algorithms in Data Mining
Data Mining and Knowledge Discovery
Cluster validation techniques for genome expression data
Signal Processing - Special issue: Genomic signal processing
Discovering cancer biomarkers: from DNA to communities of genes
International Journal of Networking and Virtual Organisations
Hi-index | 0.00 |
Cancer classification is an important research area that has attracted the attention of several research groups over the last decades. However, there has been no general agreed upon approach for assigning tumors to known classes (a.k.a. class prediction). One challenge in microarray analysis, especially in cancerous gene expression profiles, is to identify genes or group of genes that are highly expressed in tumor cells but not in normal cells and vice versa. All of the methods described in the literature deal with features obtained directly from the data. Further, several clustering techniques have been proposed for the analysis of genome expression data, such as k-means, Self organizing maps, etc. However, these methods do not provide information about the influence of a given gene on the overall shape of the clusters. In this paper, we try to generate informative data, which can be more powerful in the classification of genes. We identify a set of reduced features capable of distinguishing between two classes by two stage clustering of genes using fuzzy c-means. In the first stage, the proposed clustering method clusters the original data. In the second stage, it clusters genes in each of the clusters produced from the first stage. We decided on using fuzzy c-means because a fuzzy model fits better gene expression data analysis by having a gene belong to different classes with a degree of membership per class. However, fuzziness parameter m is a major problem in applying fuzzy c-means for clustering. In this approach, we try to better identify the value of the fuzziness parameter when applying fuzzy c-means for microarray data. Support vector machine combined with different kernel functions are used for classification. The results from the experiments conducted on three benchmark data sets (including one multi-class data set) demonstrate the applicability and effectiveness of the proposed approach as compared to the other approaches described in the literature.