3-D brain MRI tissue classification on FPGAs
IEEE Transactions on Image Processing
State-of-the-art in heterogeneous computing
Scientific Programming
Challenging cloning related problems with GPU-based algorithms
Proceedings of the 4th International Workshop on Software Clones
A GPGPU transparent virtualization component for high performance computing clouds
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
A two-level real-time vision machine combining coarse- and fine-grained parallelism
Journal of Real-Time Image Processing
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
CoRAM: an in-fabric memory architecture for FPGA-based computing
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
FPGA vs. multi-core CPUs vs. GPUs: hands-on experience with a sorting application
Facing the multicore-challenge
FPGA vs. multi-core CPUs vs. GPUs: hands-on experience with a sorting application
Facing the multicore-challenge
Evolutionary approach to improve wavelet transforms for image compression in embedded systems
EURASIP Journal on Advances in Signal Processing - Special issue on biologically inspired signal processing: analyses, algorithms and applications
Platform-aware bottleneck detection for reconfigurable computing applications
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Genetic Programming and Evolvable Machines
A fast, GPU based, dictionary attack to OpenPGP secret keyrings
Journal of Systems and Software
A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
The "Chimera": an off-the-shelf CPU/GPGPU/FPGA hybrid computing platform
International Journal of Reconfigurable Computing - Special issue on High-Performance Reconfigurable Computing
Rapid computation of value and risk for derivatives portfolios
Concurrency and Computation: Practice & Experience
A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
FPGA-based architecture to speed-up scientific computation in seismic applications
International Journal of High Performance Systems Architecture
C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Local Interpolation-based Polar Format SAR: Algorithm, Hardware Implementation and Design Automation
Journal of Signal Processing Systems
Communications of the ACM
Efficient compilation of CUDA kernels for high-performance computing on FPGAs
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
pvFPGA: accessing an FPGA-based hardware accelerator in a paravirtualized environment
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
A framework for comparing high performance computing technologies
International Journal of Computational Science and Engineering
Journal of Real-Time Image Processing
Hi-index | 0.02 |
Accelerators are special purpose processors designed to speed up compute-intensive sections of applications. Two extreme endpoints in the spectrum of possible accelerators are FPGAs and GPUs, which can often achieve better performance than CPUs on certain workloads. FPGAs are highly customizable, while GPUs provide massive parallel execution resources and high memory bandwidth. Applications typically exhibit vastly different performance characteristics depending on the accelerator. This is an inherent problem attributable to architectural design, middleware support and programming style of the target platform. For the best application-to-accelerator mapping, factors such as programmability, performance, programming cost and sources of overhead in the design flows must be all taken into consideration. In general, FPGAs provide the best expectation of performance, flexibility and low overhead, while GPUs tend to be easier to program and require less hardware resources. We present a performance study of three diverse applications—Gaussian Elimination, Data Encryption Standard (DES), and Needleman-Wunsch—on an FPGA, a GPU and a multicore CPU system. We perform a comparative study of application behavior on accelerators considering performance and code complexity. Based on our results, we present an application characteristic to accelerator platform mapping, which can aid developers in selecting an appropriate target architecture for their chosen application.