A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

Authors:
Jeremy Fowers;Greg Brown;Patrick Cooke;Greg Stitt
Affiliations:
University of Florida, Gainesville, FL, USA;University of Florida, Gainesville, FL, USA;University of Florida, Gainesville, FL, USA;University of Florida, Gainesville, FL, USA
Venue:
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Year:
2012

Citing 10
Cited 10

A quantitative analysis of the speedup factors of FPGAs over processors

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Input data reuse in compiling window operations onto reconfigurable hardware

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance

FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Automatic Sliding Window Operation Optimization for FPGA-Based

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Matched Filter Computation on FPGA, Cell and GPU

FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Real-Time Optical Flow Calculations on FPGA and GPU Architectures: A Comparison Study

FCCM '08 Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Machines
Accelerating Compute-Intensive Applications with GPUs and FPGAs

SASP '08 Proceedings of the 2008 Symposium on Application Specific Processors
Optimized generation of memory structure in compiling window operations onto reconfigurable hardware

ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
Correntropy: Properties and Applications in Non-Gaussian Signal Processing

IEEE Transactions on Signal Processing
A Comparison of FPGA and GPU for Real-Time Phase-Based Optical Flow, Stereo, and Local Image Features

IEEE Transactions on Computers

A low-overhead interconnect architecture for virtual reconfigurable fabrics

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Profile-guided floating- to fixed-point conversion for hybrid FPGA-processor applications

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Dynafuse: dynamic dependence analysis for FPGA pipeline fusion and locality optimizations

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
A high-performance, low-energy FPGA accelerator for correntropy-based feature tracking (abstract only)

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Hardware acceleration of retinal blood vasculature segmentation

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
A high-level synthesis flow for the implementation of iterative stencil loop algorithms on FPGA devices

Proceedings of the 50th Annual Design Automation Conference
FPGA-based hardware acceleration for local complexity analysis of massive genomic data

Integration, the VLSI Journal
Hardware---software optimizations of reconfigurable multi-core processors for floating-point computations of large sparse matrices

Journal of Real-Time Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the emergence of accelerator devices such as multicores, graphics-processing units (GPUs), and field-programmable gate arrays (FPGAs), application designers are confronted with the problem of searching a huge design space that has been shown to have widely varying performance and energy metrics for different accelerators, different application domains, and different use cases. To address this problem, numerous studies have evaluated specific applications across different accelerators. In this paper, we analyze an important domain of applications, referred to as sliding-window applications, when executing on FPGAs, GPUs, and multicores. For each device, we present optimization strategies and analyze use cases where each device is most effective. The results show that FPGAs can achieve speedup of up to 11x and 57x compared to GPUs and multicores, respectively, while also using orders of magnitude less energy.