Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination
The Journal of Machine Learning Research
DTMBIO 2013: international workshop on data and text mining in biomedical informatics
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Epidemiologic and phenotypic evidences indicate that breast and prostate cancers have high pathological similarities. Analysis of pathological similarities between cancers can be beneficial in several aspects such as enabling the knowledge transfer between the cancer studies. To gain knowledge of the similarity between the breast and prostate cancer pathology, common genes that are affected by the two carcinomas are investigated. Gene expression data extracted from RNA-seq experiments, provided through TCGA consortium, is used for gene selection. Gene selection was performed using an iterative SVM based ensemble feature selection approach. Iterative SVM-based gene selection methods enable correlated gene expressions to be considered simultaneously and ensemble approach stabilizes the selection. As results of the analysis, two genes, Transglutaminase 4 (TGM4) and complement component 4A (C4A), were selected as commonly altered genes. Direct relationships of the two genes to the two cancers are not confirmed. However, TGM4 is known to be associated with adenocarcinomas and C4A with ovarian cancer. Thus provides evidence that they maybe pathologically important genes for the two cancers.