Accelerating Compute-Intensive Applications with GPUs and FPGAs

Authors:
Shuai Che;Jie Li;Jeremy W. Sheaffer;Kevin Skadron;John Lach
Affiliations:
Department of Computer Science, University of Virginia. sc5nf@virginia.edu;Department of Electrical and Computer Engineering, University of Virginia. jl3yh@virginia.edu;Department of Computer Science, University of Virginia. jws9c@virginia.edu;Department of Computer Science, University of Virginia/ NVIDIA Research. skadron@virginia.edu;Department of Electrical and Computer Engineering, University of Virginia. jlach@virginia.edu
Venue:
SASP '08 Proceedings of the 2008 Symposium on Application Specific Processors
Year:
2008

Citing 0
Cited 26

3-D brain MRI tissue classification on FPGAs

IEEE Transactions on Image Processing
State-of-the-art in heterogeneous computing

Scientific Programming
Challenging cloning related problems with GPU-based algorithms

Proceedings of the 4th International Workshop on Software Clones
A GPGPU transparent virtualization component for high performance computing clouds

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
A two-level real-time vision machine combining coarse- and fine-grained parallelism

Journal of Real-Time Image Processing
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
CoRAM: an in-fabric memory architecture for FPGA-based computing

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
FPGA vs. multi-core CPUs vs. GPUs: hands-on experience with a sorting application

Facing the multicore-challenge
FPGA vs. multi-core CPUs vs. GPUs: hands-on experience with a sorting application

Facing the multicore-challenge
Evolutionary approach to improve wavelet transforms for image compression in embedded systems

EURASIP Journal on Advances in Signal Processing - Special issue on biologically inspired signal processing: analyses, algorithms and applications
Platform-aware bottleneck detection for reconfigurable computing applications

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Accelerating floating-point fitness functions in evolutionary algorithms: a FPGA-CPU-GPU performance comparison

Genetic Programming and Evolvable Machines
A fast, GPU based, dictionary attack to OpenPGP secret keyrings

Journal of Systems and Software
A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
The "Chimera": an off-the-shelf CPU/GPGPU/FPGA hybrid computing platform

International Journal of Reconfigurable Computing - Special issue on High-Performance Reconfigurable Computing
Rapid computation of value and risk for derivatives portfolios

Concurrency and Computation: Practice & Experience
A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
FPGA-based architecture to speed-up scientific computation in seismic applications

International Journal of High Performance Systems Architecture
C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
A high-performance, low-energy FPGA accelerator for correntropy-based feature tracking (abstract only)

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Local Interpolation-based Polar Format SAR: Algorithm, Hardware Implementation and Design Automation

Journal of Signal Processing Systems
Algorithmic trading review

Communications of the ACM
Efficient compilation of CUDA kernels for high-performance computing on FPGAs

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
pvFPGA: accessing an FPGA-based hardware accelerator in a paravirtualized environment

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
A framework for comparing high performance computing technologies

International Journal of Computational Science and Engineering
A comprehensive comparison of GPU- and FPGA-based acceleration of reflection image reconstruction for 3D ultrasound computer tomography

Journal of Real-Time Image Processing

Quantified Score

Hi-index	0.02

Visualization

Abstract

Accelerators are special purpose processors designed to speed up compute-intensive sections of applications. Two extreme endpoints in the spectrum of possible accelerators are FPGAs and GPUs, which can often achieve better performance than CPUs on certain workloads. FPGAs are highly customizable, while GPUs provide massive parallel execution resources and high memory bandwidth. Applications typically exhibit vastly different performance characteristics depending on the accelerator. This is an inherent problem attributable to architectural design, middleware support and programming style of the target platform. For the best application-to-accelerator mapping, factors such as programmability, performance, programming cost and sources of overhead in the design flows must be all taken into consideration. In general, FPGAs provide the best expectation of performance, flexibility and low overhead, while GPUs tend to be easier to program and require less hardware resources. We present a performance study of three diverse applications—Gaussian Elimination, Data Encryption Standard (DES), and Needleman-Wunsch—on an FPGA, a GPU and a multicore CPU system. We perform a comparative study of application behavior on accelerators considering performance and code complexity. Based on our results, we present an application characteristic to accelerator platform mapping, which can aid developers in selecting an appropriate target architecture for their chosen application.