A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Impact of modern memory subsystems on cache optimizations for stencil computations
Proceedings of the 2005 workshop on Memory system performance
The potential of the cell processor for scientific computing
Proceedings of the 3rd conference on Computing frontiers
An Evaluation of Marenostrum Performance
International Journal of High Performance Computing Applications
Scientific computing Kernels on the cell processor
International Journal of Parallel Programming
Efficient Breadth-First Search on the Cell/BE Processor
IEEE Transactions on Parallel and Distributed Systems
Balancing productivity and performance on the cell broadband engine
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Interfacing computer aided parallelization and performance analysis
ICCS'03 Proceedings of the 2003 international conference on Computational science
State-of-the-art in heterogeneous computing
Scientific Programming
Introducing the semi-stencil algorithm
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations
PADS '10 Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
Reverse-Time Migration (RTM) is a state-of-the-art technique in seismic acoustic imaging, because of the quality and integrity of the images it provides. Oil and gas companies trust RTM with crucial decisions on multi-million-dollar drilling investments. But RTM requires vastly more computational power than its predecessor techniques, and this has somewhat hindered its practical success. On the other hand, despite multi-core architectures promise to deliver unprecedented computational power, little attention has been devoted to mapping efficiently RTM to multi-cores. In this paper, we present a mapping of the RTM computational kernel to the IBM Cell/B.E. processor that reaches close-to-optimal performance. The kernel proves to be memory-bound and it achieves a 98% utilization of the peak memory bandwidth. Our Cell/B.E. implementation outperforms a traditional processor (PowerPC 970MP) in terms of performance (with an 15.0× speedup) and energy-efficiency (with a 10.0× increase in the GFlops/W delivered). Also, it is the fastest RTM implementation available to the best of our knowledge. These results increase the practical usability of RTM. Also, the RTM-Cell/B.E. combination proves to be a strong competitor in the seismic arena.