Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
An FPGA implementation of the two-dimensional finite-difference time-domain (FDTD) algorithm
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Reflections on the memory wall
Proceedings of the 1st conference on Computing frontiers
FPGA-Based Acceleration of the 3D Finite-Difference Time-Domain Method
FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Time Domain Numerical Simulation for Transient Waves on Reconfigurable Coprocessor Platform
FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
An Efficient Implementation of High-Accuracy Finite Difference Computing Engine on FPGAs
ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Finding Speedup in Parallel Processors
ISPDC '08 Proceedings of the 2008 International Symposium on Parallel and Distributed Computing
A perfectly matched layer for the absorption of electromagnetic waves
Journal of Computational Physics
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
Revisiting finite difference and spectral migration methods on diverse parallel architectures
Computers & Geosciences
Hi-index | 0.00 |
Memory-related constraints (memory bandwidth, cache size) are nowadays the performance bottleneck of most computational applications. Especially in the scenario of multiple cores, the performance does not scale with the number of cores in many cases. In our work, we present our FPGA-based solution for the 3D Reverse Time Migration (RTM) algorithm. As the most computationally demanding imaging algorithm in current oil and gas exploration, RTM involves various computational challenges, such as a high demand for storage size and bandwidth, and a poor cache behavior. Combining optimizations from both the algorithmic and architectural perspectives, our FPGA-based solution manages to remove the memory constraints and provide a high performance that can scale well with the amount of computational resources available. Compared with an optimized CPU implementation using two quad-core Intel Nehalem CPUs, our solution achieves 4x speedup on two Virtex-5 FPGAs, and 8x speedup on two Virtex-6 FPGAs. Our projection demonstrates that the performance will continue to scale with the future increase of FPGA capacities.