Efficient model-based clustering for LC-MS data

Authors:
Marta Łuksza;Bogusław Kluge;Jerzy Ostrowski;Jakub Karczmarski;Anna Gambin
Affiliations:
Institute of Informatics, Warsaw University, Warsaw, Poland;Institute of Informatics, Warsaw University, Warsaw, Poland;Department of Gastroenterology, Medical Center for Postgraduate Education and Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Warsaw, Poland;Department of Gastroenterology, Medical Center for Postgraduate Education and Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Warsaw, Poland;Institute of Informatics, Warsaw University, Warsaw, Poland
Venue:
WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
Year:
2006

Citing 4
Cited 1

Random Forests

Machine Learning
Sample classification from protein mass spectrometry, by 'peak probability contrasts'

Bioinformatics
SpecAlign---processing and alignment of mass spectra datasets

Bioinformatics
On the Slow Convergence of EM and VBEM in Low-Noise Linear Models

Neural Computation

Classification of peptide mass fingerprint data by novel no-regret boosting method

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Proteomic mass spectrometry is gaining an increasing role in diagnostics and in studies on protein complexes and biological systems. The issue of high-throughput data processing is therefore becoming more and more significant. The problems of data imperfectness, presence of noise and of various errors introduced during experiments arise. In this paper we focus on the peak alignment problem. As an alternative to heuristic based approaches to aligning peaks from different mass spectra we propose a mathematically sound method which exploits the model-based approach. In this framework experiment errors are modeled as deviations from real values and mass spectra are regarded as finite Gaussian mixtures. The advantage of such an approach is that it provides convenient techniques for adjusting parameters and selecting solutions of best quality. The method can be parameterized by assuming various constraints. In this paper we investigate different classes of models and select the most suitable one. We analyze the results in terms of statistically significant biomarkers that can be identified after alignment of spectra.