A high performance algorithm for clustering of large-scale protein mass spectrometry data using multi-core architectures

Authors:
Fahad Saeed;Jason D. Hoffert;Mark A. Knepper
Affiliations:
National Institutes of Health (NIH), Bethesda, MD;National Institutes of Health (NIH), Bethesda, MD;National Institutes of Health (NIH), Bethesda, MD
Venue:
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Year:
2013

Citing 9
Cited 0

A fast coarse filtering method for peptide identification by mass spectrometry

Bioinformatics
Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search

Bioinformatics
A domain decomposition strategy for alignment of multiple biological sequences on multiprocessor platforms

Journal of Parallel and Distributed Computing
Accelerating Pairwise Computations on Cell Processors

IEEE Transactions on Parallel and Distributed Systems
Research note: A high performance multiple sequence alignment system for pyrosequencing reads from multiple reference genomes

Journal of Parallel and Distributed Computing
High performance phosphorylation site assignment algorithm for mass spectrometry data using multicore systems

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Graph coloring algorithms for multi-core and massively multithreaded architectures

Parallel Computing
An efficient algorithm for clustering of large-scale mass spectrometry data

BIBM '12 Proceedings of the 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
An efficient dynamic programming algorithm for phosphorylation site assignment of large-scale mass spectrometry data

BIBMW '12 Proceedings of the 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-throughput mass spectrometers can produce thousands of peptide spectra from a single complex protein sample in a short amount of time. These data sets contain a substantial amount of redundancy (i.e. the same peptide is selected and identified multiple times in a single experiment) from peptides that may get selected multiple times in the liquid chromatography mass spectrometry (LC-MS/MS) experiment. The data from these mass spectrometers contain a substantial number of spectra that have low signal to noise (S/N) ratio and may not get interpreted due to poor quality. Recently, we presented a graph theoretic algorithm, CAMS (Clustering Algorithm for Mass Spectra) for clustering mass spectrometry data. CAMS utilized a novel metric, called a F-set, that allows accurate identification of the spectra that are similar with much higher accuracy and sensitivity than if single peak comparisons were performed. In this paper we present a multithreaded algorithm, called P-CAMS, for clustering of mass spectral data on multicore machines. The algorithm relies on intelligent matrix completion for graph construction and a load-balancing scheme for substantial speedups. We study the scalability performance of the proposed parallel algorithm on a multicore machine using synthetically generated spectra with parameters carefully chosen to mimic real-world mass spectrometry datasets. Real experimental datasets were also generated for quality assessment of the clustering results from the proposed algorithm. The results show that the proposed algorithms have scalable runtime performances and gives clustering results similar to a serial algorithm. The study also provides insight into the design of high performance algorithms for irregular problems in proteomics on many-core architectures.