Radix sort for vector multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Designing efficient sorting algorithms for manycore GPUs
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
On the Fractal Dimension of Isosurfaces
IEEE Transactions on Visualization and Computer Graphics
Comparing Hardware Accelerators in Scientific Applications: A Case Study
IEEE Transactions on Parallel and Distributed Systems
UJA-3DFD: A program to compute the 3D fractal dimension from MRI data
Computer Methods and Programs in Biomedicine
Analysis of Fast Parallel Sorting Algorithms for GPU Architectures'
FIT '11 Proceedings of the 2011 Frontiers of Information Technology
Parallel computing of 3D smoking simulation based on OpenCL heterogeneous platform
The Journal of Supercomputing
A Survey of Parallel Programming Models and Tools in the Multi and Many-Core Era
IEEE Transactions on Parallel and Distributed Systems
Fast box-counting algorithm on GPU
Computer Methods and Programs in Biomedicine
Hi-index | 0.00 |
In this paper, we present the analysis and development of a cross-platform OpenCL implementation of the box-counting algorithm, which is one of the most widely-used methods for estimating the Fractal Dimension. The Fractal Dimension is a relevant image analysis method used in several disciplines, but computing it is in general a time consuming process, especially when working with 3D images. Unlike parallel programming models that strictly depend on the hardware type and manufacturer, like CUDA, OpenCL allows us to provide an implementation suitable for execution on both GPUs and multi-core CPUs, whatever the hardware manufacturer. Sorting is a key part of the fast box-counting algorithm and the final speedup is highly conditioned by the efficiency of the sorting algorithm used. Our study reveals that current OpenCL implementations of sorting algorithms are clearly slower when compared with both CUDA for GPU and specific multi-core CPU implementations. Our OpenCL algorithm has been specifically optimized according the type of the target device and the results show an average speedup of up to 7.46脳 and 4脳, when executed on the GPU and the multi-core CPU respectively, both compared with the single-threaded (sequential) CPU implementation.