Support vector classification of proteomic profile spectra based on feature extraction with the bi-orthogonal discrete wavelet transform

  • Authors:
  • Frank-Michael Schleif;Mathias Lindemann;Mario Diaz;Peter Maaß;Jens Decker;Thomas Elssner;Michael Kuhn;Herbert Thiele

  • Affiliations:
  • University of Leipzig, Department of Mathematics and Computer Science, Leipzig, Germany;University of Bremen, Zentrum für Technomathematik, FB 3, Bremen, Germany;University of Bremen, Zentrum für Technomathematik, FB 3, Bremen, Germany;University of Bremen, Zentrum für Technomathematik, FB 3, Bremen, Germany;Bruker Daltonik GmbH, Fahrenheitstrasse 4, 28359, Bremen, Germany;Bruker Daltonik GmbH, Fahrenheitstrasse 4, 28359, Bremen, Germany;LightTrans GmbH, Jena, Germany;Bruker Daltonik GmbH, Fahrenheitstrasse 4, 28359, Bremen, Germany

  • Venue:
  • Computing and Visualization in Science
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic classification of high-resolution mass spectrometry data has increasing potential to support physicians in diagnosis of diseases like cancer. The proteomic data exhibit variations among different disease states. A precise and reliable classification of mass spectra is essential for a successful diagnosis and treatment. The underlying process to obtain such reliable classification results is a crucial point. In this paper such a method is explained and a corresponding semi automatic parameterization procedure is derived. Thereby a simple straightforward classification procedure to assign mass spectra to a particular disease state is derived. The method is based on an initial preprocessing stage of the whole set of spectra followed by the bi-orthogonal discrete wavelet transform (DWT) for feature extraction. The approximation coefficients calculated from the scaling function exhibit a high peak pattern matching property and feature a denoising of the spectrum. The discriminating coefficients, selected by the Kolmogorov–Smirnov test are finally used as features for training and testing a support vector machine with both a linear and a radial basis kernel. For comparison the peak areas obtained with the it ClinProt-System 1 [33] were analyzed using the same support vector machines. The introduced approach was evaluated on clinical MALDI-MS data sets with two classes each originating from cancer studies. The cross validated error rates using the wavelet coefficients where better than those obtained from the peak areas2.