Ensemble methods for classification of patients for personalized medicine with high-dimensional data

Authors:
Hojin Moon;Hongshik Ahn;Ralph L. Kodell;Songjoon Baek;Chien-Ju Lin;James J. Chen
Affiliations:
Department of Mathematics and Statistics, California State University-Long Beach, 1250 Bellflower Blvd., Long Beach, CA 90840, USA;Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794-3600, USA;Department of Biostatistics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA;Division of Biometry and Risk Assessment, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA;Division of Biometry and Risk Assessment, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA;Division of Biometry and Risk Assessment, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA
Venue:
Artificial Intelligence in Medicine
Year:
2007

Citing 12
Cited 11

The Strength of Weak Learnability

Machine Learning
The nature of statistical learning theory

The nature of statistical learning theory
Bagging predictors

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
MultiBoosting: A Technique for Combining Boosting and Wagging

Machine Learning
Random Forests

Machine Learning
Boosting and Microarray Data

Machine Learning
Class discovery and classification of tumor samples using mixture modeling of gene expression data---a unified approach

Bioinformatics
Prediction error estimation: a comparison of resampling methods

Bioinformatics
Classification by ensembles from random partitions of high-dimensional data

Computational Statistics & Data Analysis
Application of majority voting to pattern recognition: an analysis of its behavior and performance

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors

Artificial Intelligence in Medicine
A model-free ensemble method for class prediction with application to biomedical decision making

Artificial Intelligence in Medicine
Ensemble gene selection by grouping for microarray data classification

Journal of Biomedical Informatics
Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction

Computers in Biology and Medicine
Ensemble gene selection for cancer classification

Pattern Recognition
Wavelet selection for disease classification by DNA microarray data

Expert Systems with Applications: An International Journal
Boosting-based discovery of multi-component physiological indicators: applications to express diagnostics and personalized treatment optimization

Proceedings of the 1st ACM International Health Informatics Symposium
Selective voting in convex-hull ensembles improves classification accuracy

Artificial Intelligence in Medicine
genEnsemble: A new model for the combination of classifiers and integration of biological knowledge applied to genomic data

Expert Systems with Applications: An International Journal
Ensemble-based regression analysis of multimodal medical data for osteopenia diagnosis

Expert Systems with Applications: An International Journal
Diverse accurate feature selection for microarray cancer diagnosis

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: Personalized medicine is defined by the use of genomic signatures of patients in a target population for assignment of more effective therapies as well as better diagnosis and earlier interventions that might prevent or delay disease. An objective is to find a novel classification algorithm that can be used for prediction of response to therapy in order to help individualize clinical assignment of treatment. Methods and materials: Classification algorithms are required to be highly accurate for optimal treatment on each patient. Typically, there are numerous genomic and clinical variables over a relatively small number of patients, which presents challenges for most traditional classification algorithms to avoid over-fitting the data. We developed a robust classification algorithm for high-dimensional data based on ensembles of classifiers built from the optimal number of random partitions of the feature space. The software is available on request from the authors. Results: The proposed algorithm is applied to genomic data sets on lymphoma patients and lung cancer patients to distinguish disease subtypes for optimal treatment and to genomic data on breast cancer patients to identify patients most likely to benefit from adjuvant chemotherapy after surgery. The performance of the proposed algorithm is consistently ranked highly compared to the other classification algorithms. Conclusion: The statistical classification method for individualized treatment of diseases developed in this study is expected to play a critical role in developing safer and more effective therapies that replace one-size-fits-all drugs with treatments that focus on specific patient needs.