A protocol for building and evaluating predictors of disease state based on microarray data

Authors:
Lodewyk F. A. Wessels;Marcel J. T. Reinders;Augustinus A. M. Hart;Cor J. Veenman;Hongyue Dai;Yudong D. He;Laura J. Van'T Veer
Affiliations:
Department of Mediamatics, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands;Department of Mediamatics, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands;Department of Radiotherapy, The Netherlands Cancer Institute Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands;Department of Mediamatics, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands;Rosetta Inpharmatics LLC (a wholly owned subsidiary of Merck & Co., Inc.) 401 Terry Avenue N. Seattle, Washington 98109, USA;Rosetta Inpharmatics LLC (a wholly owned subsidiary of Merck & Co., Inc.) 401 Terry Avenue N. Seattle, Washington 98109, USA;Department of Pathology, The Netherlands Cancer Institute Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 9

Random subspace method for multivariate feature selection

Pattern Recognition Letters
Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications

IEEE Transactions on Pattern Analysis and Machine Intelligence
Laplacian Linear Discriminant Analysis Approach to Unsupervised Feature Selection

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Feature selection from Barkhausen noise data using genetic algorithms with cross-validation

ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
Sequence-based prediction of protein secretion success in Aspergillus niger

PRIB'10 Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics
Constrained parameter estimation for semi-supervised learning: the case of the nearest mean classifier

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Multi-step dimensionality reduction and semi-supervised graph-based tumor classification using gene expression data

Artificial Intelligence in Medicine
Local topological signatures for network-based prediction of biological function

PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics
Using predictive models to engineer biology: a case study in codon optimization

PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Microarray gene expression data are increasingly employed to identify sets of marker genes that accurately predict disease development and outcome in cancer. Many computational approaches have been proposed to construct such predictors. However, there is, as yet, no objective way to evaluate whether a new approach truly improves on the current state of the art. In addition no 'standard' computational approach has emerged which enables robust outcome prediction. Results: An important contribution of this work is the description of a principled training and validation protocol, which allows objective evaluation of the complete methodology for constructing a predictor. We review the possible choices of computational approaches, with specific emphasis on predictor choice and reporter selection strategies. Employing this training-validation protocol, we evaluated different reporter selection strategies and predictors on six gene expression datasets of varying degrees of difficulty. We demonstrate that simple reporter selection strategies (forward filtering and shrunken centroids) work surprisingly well and outperform partial least squares in four of the six datasets. Similarly, simple predictors, such as the nearest mean classifier, outperform more complex classifiers. Our training-validation protocol provides a robust methodology to evaluate the performance of new computational approaches and to objectively compare outcome predictions on different datasets. Contact: l.f.a.wessels@ewi.tudelft.nl Supplementary information: http://ict.ewi.tudelft.nl/index.php?option=com_pub&task=view&id=1983