Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Applications of Hidden Markov Models to Detecting Multi-Stage Network Attacks
HICSS '03 Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 9 - Volume 9
Image classification by a two-dimensional hidden Markov model
IEEE Transactions on Signal Processing
Accelerator design for protein sequence HMM search
Proceedings of the 20th annual international conference on Supercomputing
Automatic partitioning and mapping of stream-based applications onto the Intel IXP Network processor
SCOPES '07 Proceedingsof the 10th international workshop on Software & compilers for embedded systems
MPI-HMMER-Boost: Distributed FPGA Acceleration
Journal of VLSI Signal Processing Systems
Integrating FPGA acceleration into HMMer
Parallel Computing
HSP-HMMER: a tool for protein domain identification on a large scale
Proceedings of the 2009 ACM symposium on Applied Computing
Hardware Acceleration of HMMER on FPGAs
Journal of Signal Processing Systems
Autotuned parallel I/O for highly scalable biosequence analysis
Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
Hi-index | 0.00 |
While general-purpose processors have only recently employed chip multiprocessor (CMP) architectures, network processors (NPs) have used heterogeneous multi-core architectures since the late 1990s. NPs differ qualitatively from workstation and server CMPs in that they replicate many simple, highly efficient processor cores on a chip, rather than a small number of sophisticated superscalar CPUs. In this paper, we compare the performance of one such NP, the Intel IXP 2850, to that of the Intel Pentium 4 when executing a scientific computing workload with a high degree of thread-level parallelism. Our target program, HMMer, is a bioinformatics tool that identifies conserved motifs in protein sequences. HMMer represents motifs as hidden Markov models (HMMs) and spends most of its time executing the well-known Viterbi algorithm to align proteins to these models. Our observations of HMMer on the IXP are therefore relevant to computations in many other domains that rely on the Viterbi algorithm. We show that the IXP achieves a speedup of 1.82 over the Pentium, despite the Pentiumýs 1.85x faster clock. Moreover, we argue that nextgeneration IXP NPs will likely provide a 10-20x speedup for our workload over the IXP 2850, in contrast to 5-10x speedup expected from a next-generation Pentium-based CMP.