Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
An Effective Strategy for Porting C++ Applications on Cell
ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
CellSort: high performance sorting on the cell processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Vectorized data processing on the cell broadband engine
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Balancing productivity and performance on the cell broadband engine
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
FFTC: fastest Fourier transform for the IBM cell broadband engine
HiPC'07 Proceedings of the 14th international conference on High performance computing
GPU accelerated smith-waterman
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Long DNA sequence comparison on multicore architectures
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Journal of Signal Processing Systems
Characterization of Smith-Waterman sequence database search in X10
Proceedings of the 2012 ACM SIGPLAN X10 Workshop
C2FPGA-A dependency-timing graph design methodology
Journal of Parallel and Distributed Computing
Fine-grained parallel implementations for SWAMP+ Smith-Waterman alignment
Parallel Computing
Hi-index | 0.00 |
This paper presents and evaluates a model and a methodology for implementing parallel wavefront algorithms on the Cell Broadband Engine. Wavefront algorithms are vital in several application areas such as computational biology, particle physics, and systems of linear equations. The model uses blocked data decomposition with pipelined execution of blocks across the synergistic processing elements (SPEs) of the Cell. To evaluate the model, we implement the Smith-Waterman sequence alignment algorithm as a wavefront algorithm and present key optimization techniques that complement the vector processing capabilities of the SPE. Our results show perfect linear speedup for up to 16 SPEs on the QS20 dual-Cell blades, and our model shows that our implementation is highly scalable for more cores, if available. Furthermore, the accuracy of our model is within 3% of the measured values on average. Lastly, we also test our model in a throughput-oriented experimental setting, where we couple the model with scheduling techniques that exploit parallelism across the simultaneous execution of multiple sequence alignments. Using our model, we improved the throughput of realistic multisequence alignment workloads by up to 8% compared to FCFS (first-come, first-serve), by trading off parallelism within alignments with parallelism across alignments.