Data pre-processing: a new algorithm for feature selection and data discretization
CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
International Journal of Data Mining and Bioinformatics
Hi-index | 0.00 |
In this paper, we try to identify a set of reduced features capable of distinguishing between two classes by performing double clustering using fuzzy c-means. We decided on using fuzzy c-means because a fuzzy model fits better the gene expression data analysis. Fuzziness parameter m is a major problem in applying fuzzy c-means method for clustering. In this approach, we applied fuzzy c-means clustering using different fuzziness parameters for two forms of microarray data. Support vector machine with different kernel functions are used for classification. As a result of the experiments conducted on the colon dataset, we have observed that CSVM is able to correctly classify the whole training and test sets when the data is log2 transformed and when m is close to 1.5.