An introduction to parallel algorithms
An introduction to parallel algorithms
Performance prediction of parallel processing systems: the PAMELA methodology
ICS '93 Proceedings of the 7th international conference on Supercomputing
Chip multiprocessing and the cell broadband engine
Proceedings of the 3rd conference on Computing frontiers
The potential of the cell processor for scientific computing
Proceedings of the 3rd conference on Computing frontiers
MPI Microtask for programming the cell broadband engineTM processor
IBM Systems Journal
Molecular dynamics simulations on commodity GPUs with CUDA
HiPC'07 Proceedings of the 14th international conference on High performance computing
Using many-core hardware to correlate radio astronomy signals
Proceedings of the 23rd international conference on Supercomputing
Proceedings of the 5th ACM/SPEC international conference on Performance engineering
Hi-index | 0.00 |
The performance potential of the Cell/B.E., as well as its availability, have attracted a lot of attention from various high-performance computing (HPC) fields. While computation intensive kernels proved to be exceptionally well suited for running on the Cell, irregular data-intensive applications are usually considered as poor matches. In this paper, we present our complete solution for enabling such a data-intensive application to run efficiently on the Cell/B.E. processor. Specifically, we target radioastronomy data gridding and degridding, two resembling imaging filters based on convolutional resampling. Our solution is based on building a high-level application model, used to evaluate parallelization alternatives. Next, we choose the one with the best performance potential, and we gradually exploit this potential by applying platform-specific and application-specific optimizations. After several iterations, our target application shows a speed-up factor between 10 and 20 on a dual-Cell blade when compared with the original application running on a commodity machine. Given these results, and based on our empirical observations, we are able to pinpoint a set of ten guidelines for parallelizing similar applications on the Cell/B.E. Finally, we conclude the Cell/B.E. can provide high performance for data-intensive applications at the price of increased programming efforts and with a significant aid from aggressive application-specific optimizations.