The Intel®8087 numeric data processor
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Ray Tracing from the Ground Up
Ray Tracing from the Ground Up
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Program optimization space pruning for a multithreaded gpu
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
FPGA-accelerated deletion-tolerant coding for reliable distributed storage
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
Hybrid Core Acceleration of UWB SIRE Radar Signal Processing
IEEE Transactions on Parallel and Distributed Systems
Comparing Hardware Accelerators in Scientific Applications: A Case Study
IEEE Transactions on Parallel and Distributed Systems
hiCUDA: High-Level GPGPU Programming
IEEE Transactions on Parallel and Distributed Systems
Design and Performance Evaluation of Image Processing Algorithms on GPUs
IEEE Transactions on Parallel and Distributed Systems
An approach for performance estimation of hybrid systems with FPGAs and GPUs as coprocessors
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Hi-index | 0.00 |
Special hardware accelerators like FPGAs and GPUs are commonly introduced into a computing system as a separate device. Consequently, the accelerator and the host system do not share a common memory. Sourcing out the data to the additional hardware thus introduces a communication penalty. Based on a combination of a program's source code and execution profiling we perform an analysis which evaluates the arithmetic intensity as a cost function to identify those parts most reasonable to source out to the accelerating hardware. The basic principles of this analysis are introduced and tested with a sample application. Its concrete results are discussed and evaluated based on the performance of a FPGA-based and a GPU-based implementation.