Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Advanced compiler design and implementation
Advanced compiler design and implementation
Introductory Techniques for 3-D Computer Vision
Introductory Techniques for 3-D Computer Vision
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Metaprogramming GPUs with Sh
Cache-aware iteration space partitioning
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Intel threading building blocks
Intel threading building blocks
Validity of the single processor approach to achieving large scale computing capabilities
AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
The Cilk++ concurrency platform
Proceedings of the 46th Annual Design Automation Conference
Voxel-based 2-D/3-D registration of fluoroscopy images and CT scans for image-guided surgery
IEEE Transactions on Information Technology in Biomedicine
An application-centric evaluation of OpenCL on multi-core CPUs
Parallel Computing
Hi-index | 0.00 |
The development of standard processors changed in the last years moving from bigger, more complex, and faster cores to putting several more simple cores onto one chip. This changed also the way programs are written in order to leverage the processing power of multiple cores of the same processor. In the beginning, programmers had to divide and distribute the work by hand to the available cores and to manage threads in order to use more than one core. Today, several frameworks exist to relieve the programmer from such tasks. In this paper, we present five such frameworks for parallelization on shared memory multi-core architectures, namely OpenMP, Cilk++, Threading Building Blocks, RapidMind, and OpenCL. To evaluate these frameworks, a real world application from medical imaging is investigated, the 2D/3D image registration. In an empirical study, a fine-grained data parallel and a coarse-grained task parallel parallelization approach are used to evaluate and estimate different aspects like usability, performance, and overhead of each framework.