Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Cost-effective medical image reconstruction: from clusters to graphics processing units
Proceedings of the 5th conference on Computing frontiers
Communication Optimization for Medical Image Reconstruction Algorithms
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Bregman-EM-TV Methods with Application to Optical Nanoscopy
SSVM '09 Proceedings of the Second International Conference on Scale Space and Variational Methods in Computer Vision
Parallel Medical Image Reconstruction: From Graphics Processors to Grids
PaCT '09 Proceedings of the 10th International Conference on Parallel Computing Technologies
A GPU implementation of a structural-similarity-based aerial-image classification
The Journal of Supercomputing
Hi-index | 0.00 |
We present and compare a variety of parallelization approaches for a real-world case study on modern parallel and distributed computer architectures. Our case study is a production-quality, time-intensive algorithm for medical image reconstruction used in computer tomography (PET). We parallelize this algorithm for the main kinds of contemporary parallel architectures: shared-memory multiprocessors, distributed-memory clusters, graphics processing units (GPU) using the CUDA framework, the Cell processor and, finally, how various architectures can be accessed in a distributed Grid environment. The main contribution of the paper, besides the parallelization approaches, is their systematic comparison regarding four important criteria: performance, programming comfort, accessibility, and cost-effectiveness. We report results of experiments on particular parallel machines of different architectures that confirm the findings of our systematic comparison.