Ensemble gene selection by grouping for microarray data classification
Journal of Biomedical Informatics
Multi-platform gene-expression mining and marker gene analysis
International Journal of Data Mining and Bioinformatics
Expert Systems with Applications: An International Journal
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Artificial Intelligence in Medicine
Expert Systems with Applications: An International Journal
Hi-index | 3.84 |
Motivation: In the context of clinical bioinformatics methods are needed for assessing the additional predictive value of microarray data compared to simple clinical parameters alone. Such methods should also provide an optimal prediction rule making use of all potentialities of both types of data: they should ideally be able to catch subtypes which are not identified by clinical parameters alone. Moreover, they should address the question of the additional predictive value of microarray data in a fair framework. Results: We propose a novel but simple two-step approach based on random forests and partial least squares (PLS) dimension reduction embedding the idea of pre-validation suggested by Tibshirani and colleagues, which is based on an internal cross-validation for avoiding overfitting. Our approach is fast, flexible and can be used both for assessing the overall additional significance of the microarray data and for building optimal hybrid classification rules. Its efficiency is demonstrated through simulations and an application to breast cancer and colorectal cancer data. Availability: Our method is implemented in the freely available R package ‘MAclinical’ which can be downloaded from http://www.stat.uni-muenchen.de/~socher/MAclinical Contact: boulesteix@slcmsr.org