Evaluation of the streams-C C-to-FPGA compiler: an applications perspective
FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Efficient FPGA-based QPSK Demodulation Loops: Application to the DVB Standard
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
A New Non-Restoring Square Root Algorithm and its VLSI Implementation
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Dynamic hardware/software partitioning: a first approach
Proceedings of the 40th annual Design Automation Conference
Systems performance measurement on PCI Pamette
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Compiling ATR Probing Codes for Execution on FPGA Hardware
FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Single-Chip Gigabit Mixed-Version IP Router on Virtex-II Pro
FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Input data reuse in compiling window operations onto reconfigurable hardware
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Reconfigurable Elliptic Curve Cryptosystems on a Chip
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Optimized Generation of Data-Path from C Codes for FPGAs
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Parallel-Beam Backprojection: An FPGA Implementation Optimized for Medical Imaging
Journal of VLSI Signal Processing Systems
A Master-Slave Adaptive Load-Distribution Processor Model on PCA
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Parallel-beam backprojection: an FPGA implementation optimized for medical imaging
Journal of VLSI Signal Processing Systems
A model-based extensible framework for efficient application design using FPGA
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Supporting multiple-input, multiple-output custom functions in configurable processors
Journal of Systems Architecture: the EUROMICRO Journal
Assessing the potential of hybrid hpc systems for scientific applications: a case study
Proceedings of the 4th international conference on Computing frontiers
Two-level microprocessor-accelerator partitioning
Proceedings of the conference on Design, automation and test in Europe
RAT: a methodology for predicting performance in application design migration to FPGAs
HPRCTA '07 Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications: held in conjunction with SC07
Application development on hybrid systems
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Efficient hardware code generation for FPGAs
ACM Transactions on Architecture and Code Optimization (TACO)
Visions for application development on hybrid computing systems
Parallel Computing
CUBA: an architecture for efficient CPU/co-processor data communication
Proceedings of the 22nd annual international conference on Supercomputing
International Journal of Parallel, Emergent and Distributed Systems
Traversal caches: a first step towards FPGA acceleration of pointer-based data structures
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Using GPUs to improve multigrid solver performance on a cluster
International Journal of Computational Science and Engineering
Reconfigurable Computing in the New Age of Parallelism
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
A Performance Model for Run-Time Reconfigurable Hardware Accelerator
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Dynamically reconfigurable dataflow architecture for high-performance digital signal processing
Journal of Systems Architecture: the EUROMICRO Journal
M2E: a multiple-input, multiple-output function extension for RISC-Based extensible processors
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
RACECAR: a heuristic for automatic function specialization on multi-core heterogeneous systems
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Accelerating XML query matching through custom stack generation on FPGAs
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
High performance reconfigurable architecture for double precision floating point division
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Elastic computing: A portable optimization framework for hybrid computers
Parallel Computing
FPGA based efficient on-chip memory for image processing algorithms
Microelectronics Journal
The RACECAR heuristic for automatic function specialization on multi-core heterogeneous systems
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Cyfield-RISP: generating dynamic instruction set processors for reconfigurable hardware using OpenCL
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I
Hi-index | 0.00 |
The speedup over a microprocessor that can be achieved by implementing some programs on an FPGA has been extensively reported. This paper presents an analysis, both quantitative and qualitative, at the architecture level of the components of this speedup. Obviously, the spatial parallelism that can be exploited on the FPGA is a big component. By itself, however, it does not account for the whole speedup.In this paper we experimentally analyze the remaining components of the speedup. We compare the performance of image processing application programs executing in hardware on a Xilinx Virtex E2000 FPGA to that on three general-purpose processor platforms: MIPS, Pentium III and VLIW. The question we set out to answer is what is the inherent advantage of a hardware implementation over a von Neumann platform. On the one hand, the clock frequency of general-purpose processors is about 20 times that of typical FPGA implementations. On the other hand, the iteration level parallelism on the FPGA is one to two orders of magnitude that on the CPUs. In addition to these two factors, we identify the efficiency advantage of FPGAs as an important factor and show that it ranges from 6 to 47 on our test benchmarks. We also identify some of the components of this factor: the streaming of data from memory, the overlap of control and data flow and the elimination of some instruction on the FPGA. The results provide a deeper understanding of the tradeoff between system complexity and performance when designing Configurable SoC as well as designing software for CSoC. They also help understand the one to two orders of magnitude in speedup of FPGAs over CPU after accounting for clock frequencies.