IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A quantitative analysis of the speedup factors of FPGAs over processors
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance
FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Matched Filter Computation on FPGA, Cell and GPU
FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Accelerating Compute-Intensive Applications with GPUs and FPGAs
SASP '08 Proceedings of the 2008 Symposium on Application Specific Processors
High speed 3D tomography on CPU, GPU, and FPGA
EURASIP Journal on Embedded Systems - Special issue on design and architectures for signal and image processing
On the energy efficiency of graphics processing units for scientific computing
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit
ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
Optimized generation of memory structure in compiling window operations onto reconfigurable hardware
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
BLAS Comparison on FPGA, CPU and GPU
ISVLSI '10 Proceedings of the 2010 IEEE Annual Symposium on VLSI
Characterization of Fixed and Reconfigurable Multi-Core Devices for Application Acceleration
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Assessing Accelerator-Based HPC Reverse Time Migration
IEEE Transactions on Parallel and Distributed Systems
Novo-G: At the Forefront of Scalable Reconfigurable Supercomputing
Computing in Science and Engineering
Memory-Efficient IPv4/v6 Lookup on FPGAs Using Distance-Bounded Path Compression
FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
An FPGA Implementation of Information Theoretic Visual-Saliency System and Its Optimization
FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
Scalable, High Performance Fourier Domain Optical Coherence Tomography: Why FPGAs and Not GPGPUs
FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
Platform-aware bottleneck detection for reconfigurable computing applications
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Hi-index | 0.00 |
Recent architectural trends have focused on increased parallelism via multicore processors and increased heterogeneity via accelerator devices (e.g., graphics-processing units, field-programmable gate arrays). Although these architectures have significant performance and energy potential, application designers face many device-specific challenges when choosing an appropriate accelerator or when customizing an algorithm for an accelerator. To help address this problem, in this article we thoroughly evaluate convolution, one of the most common operations in digital-signal processing, on multicores, graphics-processing units, and field-programmable gate arrays. Whereas many previous application studies evaluate a specific usage of an application, this article assists designers with design space exploration for numerous use cases by analyzing effects of different input sizes, different algorithms, and different devices, while also determining Pareto-optimal trade-offs between performance and energy.