Exploiting Coarse-Grained Parallelism to Accelerate Protein Motif Finding with a Network Processor

Authors:
Ben Wun;Jeremy Buhler;Patrick Crowley
Affiliations:
Department of Computer Science and Engineering Washington University in St.Louis;Department of Computer Science and Engineering Washington University in St.Louis;Department of Computer Science and Engineering Washington University in St.Louis
Venue:
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Year:
2005

Citing 4
Cited 7

Recognition of handwritten word: first and second order hidden Markov model based approach

Pattern Recognition
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Applications of Hidden Markov Models to Detecting Multi-Stage Network Attacks

HICSS '03 Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 9 - Volume 9
Image classification by a two-dimensional hidden Markov model

IEEE Transactions on Signal Processing

Accelerator design for protein sequence HMM search

Proceedings of the 20th annual international conference on Supercomputing
Automatic partitioning and mapping of stream-based applications onto the Intel IXP Network processor

SCOPES '07 Proceedingsof the 10th international workshop on Software & compilers for embedded systems
MPI-HMMER-Boost: Distributed FPGA Acceleration

Journal of VLSI Signal Processing Systems
Integrating FPGA acceleration into HMMer

Parallel Computing
HSP-HMMER: a tool for protein domain identification on a large scale

Proceedings of the 2009 ACM symposium on Applied Computing
Hardware Acceleration of HMMER on FPGAs

Journal of Signal Processing Systems
Autotuned parallel I/O for highly scalable biosequence analysis

Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

While general-purpose processors have only recently employed chip multiprocessor (CMP) architectures, network processors (NPs) have used heterogeneous multi-core architectures since the late 1990s. NPs differ qualitatively from workstation and server CMPs in that they replicate many simple, highly efficient processor cores on a chip, rather than a small number of sophisticated superscalar CPUs. In this paper, we compare the performance of one such NP, the Intel IXP 2850, to that of the Intel Pentium 4 when executing a scientific computing workload with a high degree of thread-level parallelism. Our target program, HMMer, is a bioinformatics tool that identifies conserved motifs in protein sequences. HMMer represents motifs as hidden Markov models (HMMs) and spends most of its time executing the well-known Viterbi algorithm to align proteins to these models. Our observations of HMMer on the IXP are therefore relevant to computations in many other domains that rely on the Viterbi algorithm. We show that the IXP achieves a speedup of 1.82 over the Pentium, despite the Pentiumýs 1.85x faster clock. Moreover, we argue that nextgeneration IXP NPs will likely provide a 10-20x speedup for our workload over the IXP 2850, in contrast to 5-10x speedup expected from a next-generation Pentium-based CMP.