A Partial Set Covering Model for Protein Mixture Identification Using Mass Spectrometry Data

Authors:
Zengyou He;Can Yang;Weichuan Yu
Affiliations:
Hong Kong University of Science and Technology, Hong Kong;Hong Kong University of Science and Technology, Hong Kong;Hong Kong University of Science and Technology, Hong Kong
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2011

Citing 11
Cited 2

Approximating covering and packing problems: set cover, vertex cover, independent set, and related problems

Approximation algorithms for NP-hard problems
Improved performance of the greedy algorithm for partial cover

Information Processing Letters
Computing small partial coverings

Information Processing Letters
Approximation algorithms for partial covering problems

Journal of Algorithms
TANDEM: matching proteins with tandem mass spectra

Bioinformatics
Modular, scriptable and automated analysis tools for high-throughput peptide mass fingerprinting

Bioinformatics
A unified approach to approximating partial covering problems

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
VIPER

Bioinformatics
Independent component analysis for the extraction of reliable protein signal profiles from MALDI-TOF mass spectra

Bioinformatics
Peak bagging for peptide mass fingerprinting

Bioinformatics
A Bayesian approach to protein inference problem in shotgun proteomics

RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology

ProteinLasso: A Lasso regression approach to protein inference problem in shotgun proteomics

Computational Biology and Chemistry
A Combinatorial Perspective of the Protein Inference Problem

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Protein identification is a key and essential step in mass spectrometry (MS) based proteome research. To date, there are many protein identification strategies that employ either MS data or MS/MS data for database searching. While MS-based methods provide wider coverage than MS/MS-based methods, their identification accuracy is lower since MS data have less information than MS/MS data. Thus, it is desired to design more sophisticated algorithms that achieve higher identification accuracy using MS data. Peptide Mass Fingerprinting (PMF) has been widely used to identify single purified proteins from MS data for many years. In this paper, we extend this technology to protein mixture identification. First, we formulate the problem of protein mixture identification as a Partial Set Covering (PSC) problem. Then, we present several algorithms that can solve the PSC problem efficiently. Finally, we extend the partial set covering model to both MS/MS data and the combination of MS data and MS/MS data. The experimental results on simulated data and real data demonstrate the advantages of our method: 1) it outperforms previous MS-based approaches significantly; 2) it is useful in the MS/MS-based protein inference; and 3) it combines MS data and MS/MS data in a unified model such that the identification performance is further improved.