Proteomic mass spectra classification using decision tree based ensemble methods

Authors:
Pierre Geurts;Marianne Fillet;Dominique De Seny;Marie-Alice Meuwis;Michel Malaise;Marie-Paule Merville;Louis Wehenkel
Affiliations:
Department of Electrical Engineering and Computer Science, University of Liège 4000 Liège, Belgium;Laboratory of Clinical Chemistry and Rheumatology, CBIG---Centre of Biomedical Integrative Genoproteomics, University of Liège 4000 Liège, Belgium;Laboratory of Clinical Chemistry and Rheumatology, CBIG---Centre of Biomedical Integrative Genoproteomics, University of Liège 4000 Liège, Belgium;Laboratory of Clinical Chemistry and Rheumatology, CBIG---Centre of Biomedical Integrative Genoproteomics, University of Liège 4000 Liège, Belgium;Laboratory of Clinical Chemistry and Rheumatology, CBIG---Centre of Biomedical Integrative Genoproteomics, University of Liège 4000 Liège, Belgium;Laboratory of Clinical Chemistry and Rheumatology, CBIG---Centre of Biomedical Integrative Genoproteomics, University of Liège 4000 Liège, Belgium;Department of Electrical Engineering and Computer Science, University of Liège 4000 Liège, Belgium
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 18

Extremely randomized trees

Machine Learning
Knowledge acquisition and development of accurate rules for predicting protein stability changes

Computational Biology and Chemistry
Induction of multiple criteria optimal classification rules for biological and medical data

Computers in Biology and Medicine
AptaCDSS-E: A classifier ensemble-based clinical decision support system for cardiovascular disease level prediction

Expert Systems with Applications: An International Journal
Random Forest for Gene Expression Based Cancer Classification: Overlooked Issues

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part II
A Clustering Based Hybrid System for Mass Spectrometry Data Analysis

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Classification of peptide mass fingerprint data by novel no-regret boosting method

Computers in Biology and Medicine
Is bagging effective in the classification of small-sample genomic and proteomic data?

EURASIP Journal on Bioinformatics and Systems Biology - Special issue on applications of signal procesing techniques to bioinformatics, genomics, and proteomics
Mixture classification model based on clinical markers for breast cancer prognosis

Artificial Intelligence in Medicine
A clustering based hybrid system for biomarker selection and sample classification of mass spectrometry data

Neurocomputing
Decision forest for classification of gene expression data

Computers in Biology and Medicine
Small-sample error estimation for bagged classification rules

EURASIP Journal on Advances in Signal Processing - Special issue on genomic signal processing
On oblique random forests

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Fast Kernel Discriminant Analysis for Classification of Liver Cancer Mass Spectra

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data

Information Sciences: an International Journal
A note on hyper ellipse method for classifying biological and medical data

Computers in Biology and Medicine
Effective peak alignment for mass spectrometry data analysis using two-phase clustering approach

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Modern mass spectrometry allows the determination of proteomic fingerprints of body fluids like serum, saliva or urine. These measurements can be used in many medical applications in order to diagnose the current state or predict the evolution of a disease. Recent developments in machine learning allow one to exploit such datasets, characterized by small numbers of very high-dimensional samples. Results: We propose a systematic approach based on decision tree ensemble methods, which is used to automatically determine proteomic biomarkers and predictive models. The approach is validated on two datasets of surface-enhanced laser desorption/ionization time of flight measurements, for the diagnosis of rheumatoid arthritis and inflammatory bowel diseases. The results suggest that the methodology can handle a broad class of similar problems. Supplementary information: Additional tables, appendicies and datasets may be found at http://www.montefiore.ulg.ac.be/~geurts/Papers/Proteomic-suppl.html Contact: p.geurts@ulg.ac.be