A parallel architecture for non-deterministic discrete event simulation
A parallel architecture for non-deterministic discrete event simulation
FPGA-Based Acceleration of the 3D Finite-Difference Time-Domain Method
FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Metropolitan Road Traffic Simulation on FPGAs
FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Performance benefits of monolithically stacked 3D-FPGA
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Parallel backprojection: a case study in high-performance reconfigurable computing
EURASIP Journal on Embedded Systems - FPGA supercomputing platforms, architectures, and techniques for accelerating computationally complex algorithms
Hi-index | 0.00 |
Scientific application kernels mapped to reconfigurable hardware have been reported to have 10 × to 100 × speedup over equivalent software. These promising results suggest that reconfigurable logic might offer significant speedup on applications in science and engineering. To accurately assess the benefit of hardware acceleration on scientific applications, however, it is necessary to consider the entire application including software components as well as the accelerated kernels. Aspects to be considered include alternative methods of hardware/software partitioning, communications costs, and opportunities for concurrent computation between software and hardware. Analysis of these factors is beyond the scope of current automatic parallelizing compilers. In this paper, a case study is presented in which a simulation of metropolitan road traffic networks is mapped onto a reconfigurable supercomputer, the Cray XD1. Five different methods are presented for mapping the application onto the combined hardware/software system. An approach for approximating the performance of each method is derived through analytic equations. Our results, both analytically and empirically, show that key predictors of performance (which are often not considered in reported speedup of kernel operations) are not necessarily maximum parallelism, but must account for the fraction of the problem that runs on the reconfigurable logic and the amount data flow between software and hardware.