A quantitative analysis of the speedup factors of FPGAs over processors
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Input data reuse in compiling window operations onto reconfigurable hardware
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance
FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Automatic Sliding Window Operation Optimization for FPGA-Based
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Matched Filter Computation on FPGA, Cell and GPU
FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Real-Time Optical Flow Calculations on FPGA and GPU Architectures: A Comparison Study
FCCM '08 Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Machines
Accelerating Compute-Intensive Applications with GPUs and FPGAs
SASP '08 Proceedings of the 2008 Symposium on Application Specific Processors
Optimized generation of memory structure in compiling window operations onto reconfigurable hardware
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
Correntropy: Properties and Applications in Non-Gaussian Signal Processing
IEEE Transactions on Signal Processing
IEEE Transactions on Computers
A low-overhead interconnect architecture for virtual reconfigurable fabrics
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Profile-guided floating- to fixed-point conversion for hybrid FPGA-processor applications
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Dynafuse: dynamic dependence analysis for FPGA pipeline fusion and locality optimizations
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Hardware acceleration of retinal blood vasculature segmentation
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Proceedings of the 50th Annual Design Automation Conference
FPGA-based hardware acceleration for local complexity analysis of massive genomic data
Integration, the VLSI Journal
Journal of Real-Time Image Processing
Hi-index | 0.00 |
With the emergence of accelerator devices such as multicores, graphics-processing units (GPUs), and field-programmable gate arrays (FPGAs), application designers are confronted with the problem of searching a huge design space that has been shown to have widely varying performance and energy metrics for different accelerators, different application domains, and different use cases. To address this problem, numerous studies have evaluated specific applications across different accelerators. In this paper, we analyze an important domain of applications, referred to as sliding-window applications, when executing on FPGAs, GPUs, and multicores. For each device, we present optimization strategies and analyze use cases where each device is most effective. The results show that FPGAs can achieve speedup of up to 11x and 57x compared to GPUs and multicores, respectively, while also using orders of magnitude less energy.