A code-based analytical approach for using separate device coprocessors in computing systems
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
In search of numerical consistency in parallel programming
Parallel Computing
Seamlessly portable applications: Managing the diversity of modern heterogeneous systems
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
International Journal of High Performance Computing Applications
Comparing CUDA, OpenCL and OpenGL implementations of the cardiac monodomain equations
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Three-dimensional thinning algorithms on graphics processing units and multicore CPUs
Concurrency and Computation: Practice & Experience
Optimizing Techniques for OpenCL Programs on Heterogeneous Platforms
International Journal of Grid and High Performance Computing
Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms
Proceedings of the ACM International Conference on Computing Frontiers
Parallel unsupervised Synthetic Aperture Radar image change detection on a graphics processing unit
International Journal of High Performance Computing Applications
Box-counting algorithm on GPU and multi-core CPU: an OpenCL cross-platform study
The Journal of Supercomputing
An investigation of the performance portability of OpenCL
Journal of Parallel and Distributed Computing
Optimising space exploration of OpenCL for GPGPUs
International Journal of Computational Science and Engineering
Hi-index | 0.00 |
Multicore processors and a variety of accelerators have allowed scientific applications to scale to larger problem sizes. We present a performance, design methodology, platform, and architectural comparison of several application accelerators executing a Quantum Monte Carlo application. We compare the application's performance and programmability on a variety of platforms including CUDA with Nvidia GPUs, Brook+ with ATI graphics accelerators, OpenCL running on both multicore and graphics processors, C++ running on multicore processors, and a VHDL implementation running on a Xilinx FPGA. We show that OpenCL provides application portability between multicore processors and GPUs, but may incur a performance cost. Furthermore, we illustrate that graphics accelerators can make simulations involving large numbers of particles feasible.