UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Extracting gene regulation information for cancer classification
Pattern Recognition
Selecting differentially expressed genes using minimum probability of classification error
Journal of Biomedical Informatics
Artificial Intelligence in Medicine
A review of feature selection techniques in bioinformatics
Bioinformatics
Monte Carlo feature selection for supervised classification
Bioinformatics
Computational Statistics & Data Analysis
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Bayesian binary kernel probit model for microarray based cancer classification and gene selection
Computational Statistics & Data Analysis
Gene feature extraction using T-test statistics and kernel partial least squares
ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Hi-index | 0.00 |
With advances in microarray technology, many biomarkers selection approaches have been proposed for cancer diagnosis. Marker sets are selected by scoring genes for how well they can discriminate between different classes of diseases [1-4] or are ranked by significance analysis without reference to classification tasks. However there is a pressing need for methods integrating biological priori knowledge in the gene selection process. In this study, we proposed to identify genes primarily in terms of diagnostic outcome relevance. As gene expression is a combination effect, with the help of SVD, the microarray data is decomposed, the eigenvectors correspond to the biological effect of clinical outcomes are identified. Genes which play important roles in determining this biological effect are detected. Therefore, genes are essentially identified in terms of the strength of association with clinical outcomes and the relationship of genes and clinical outcomes is analyzed. Monte Carlo simulations are then used to fine tune the selected gene set in terms of classification accuracy. The approach was tested on four public data sets. Comparative studies show that the selected genes achieved higher classification accuracies. Graphical analysis visualizes that they have close relationship with the cancer class. Statistical simulation shows that the gene set found by the proposed method is also less variable and comparatively invariant to external influences. The biological relevance of the selected genes is further discussed and validated with the literature study and analysis of biological databases.