A clustering based hybrid system for biomarker selection and sample classification of mass spectrometry data

Authors:
Pengyi Yang;Zili Zhang;Bing B. Zhou;Albert Y. Zomaya
Affiliations:
School of Information Technologies (J12), The University of Sydney, NSW 2006, Australia and NICTA, Australian Technology Park, Eveleigh, NSW 2015, Australia and Centre for Distributed and High Per ...;Faculty of Computer and Information Science, Southwest University, CQ 400715, China and School of Information Technology, Deakin University, Geelong, VIC 3217, Australia;School of Information Technologies (J12), The University of Sydney, NSW 2006, Australia and Centre for Distributed and High Performance Computing, The University of Sydney, NSW 2006, Australia;School of Information Technologies (J12), The University of Sydney, NSW 2006, Australia and Centre for Distributed and High Performance Computing, The University of Sydney, NSW 2006, Australia
Venue:
Neurocomputing
Year:
2010

Citing 14
Cited 1

Original Contribution: Stacked generalization

Neural Networks
Feature selection for ensembles

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Understanding the Crucial Role of AttributeInteraction in Data Mining

Artificial Intelligence Review
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Application of the Evolutionary Algorithms for Classifier Selection in Multiple Classifier Systems with Majority Voting

MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
Improving classification of microarray data using prototype-based feature selection

ACM SIGKDD Explorations Newsletter
Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments

Bioinformatics
Application of the GA/KNN method to SELDI proteomics data

Bioinformatics
HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data

Bioinformatics
Proteomic mass spectra classification using decision tree based ensemble methods

Bioinformatics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A Blocking Strategy to Improve Gene Selection for Classification of Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A review of feature selection techniques in bioinformatics

Bioinformatics
A Clustering Based Hybrid System for Mass Spectrometry Data Analysis

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics

A heuristic biomarker selection approach based on professional tennis player ranking strategy

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	0.01

Visualization

Abstract

Mass spectrometry (MS)-based proteomics has been established as a standard way for biomarker discovery and early detection of disease from the proteome level. However, the data generated by MS technology are noisy, have a lot of redundancy, and most importantly have low sample-to-dimension ratio. Therefore, successful analysis of MS data heavily relies on the algorithmic and data mining techniques used to extract information. In this paper, we briefly discuss some of the commonly used algorithms applied to mass-to-charge (m/z) feature selection, and propose a k-means clustering based hybrid algorithm to address the high-dimension and high-correlation issues associated with this task. Our k-means clustering based hybrid algorithm incorporates the advantages of both filter- and wrapper-based feature selection algorithms. A special iterative procedure is introduced to overcome the instability of k-means clustering and genetic algorithm. By comparing the proposed hybrid system with several popular m/z feature selection algorithms using 10 different classifiers, we show that the proposed algorithm is able to select m/z features with lower correlation while achieving higher sample classification accuracies. The m/z feature selection results also indicate that the proposed hybrid algorithm is very stable despite its stochastic nature.