A clustering based hybrid system for biomarker selection and sample classification of mass spectrometry data

  • Authors:
  • Pengyi Yang;Zili Zhang;Bing B. Zhou;Albert Y. Zomaya

  • Affiliations:
  • School of Information Technologies (J12), The University of Sydney, NSW 2006, Australia and NICTA, Australian Technology Park, Eveleigh, NSW 2015, Australia and Centre for Distributed and High Per ...;Faculty of Computer and Information Science, Southwest University, CQ 400715, China and School of Information Technology, Deakin University, Geelong, VIC 3217, Australia;School of Information Technologies (J12), The University of Sydney, NSW 2006, Australia and Centre for Distributed and High Performance Computing, The University of Sydney, NSW 2006, Australia;School of Information Technologies (J12), The University of Sydney, NSW 2006, Australia and Centre for Distributed and High Performance Computing, The University of Sydney, NSW 2006, Australia

  • Venue:
  • Neurocomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Mass spectrometry (MS)-based proteomics has been established as a standard way for biomarker discovery and early detection of disease from the proteome level. However, the data generated by MS technology are noisy, have a lot of redundancy, and most importantly have low sample-to-dimension ratio. Therefore, successful analysis of MS data heavily relies on the algorithmic and data mining techniques used to extract information. In this paper, we briefly discuss some of the commonly used algorithms applied to mass-to-charge (m/z) feature selection, and propose a k-means clustering based hybrid algorithm to address the high-dimension and high-correlation issues associated with this task. Our k-means clustering based hybrid algorithm incorporates the advantages of both filter- and wrapper-based feature selection algorithms. A special iterative procedure is introduced to overcome the instability of k-means clustering and genetic algorithm. By comparing the proposed hybrid system with several popular m/z feature selection algorithms using 10 different classifiers, we show that the proposed algorithm is able to select m/z features with lower correlation while achieving higher sample classification accuracies. The m/z feature selection results also indicate that the proposed hybrid algorithm is very stable despite its stochastic nature.