Hardware/software co-design for energy-efficient seismic modeling

Authors:
Jens Krueger;David Donofrio;John Shalf;Marghoob Mohiyuddin;Samuel Williams;Leonid Oliker;Franz-Josef Pfreund
Affiliations:
Lawrence Berkeley National Laboratory, Berkeley, CA and Fraunhofer ITWM, Kaiserslautern, Germany;Lawrence Berkeley National Laboratory, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA and University of California at Berkeley, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA;Fraunhofer ITWM, Kaiserslautern, Germany
Venue:
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Year:
2011

Citing 16
Cited 3

Tiling optimizations for 3D scientific computations

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Accelerating Seismic Migration Using FPGA-Based Coprocessor Platform

FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Cache-Efficient Multigrid Algorithms

International Journal of High Performance Computing Applications
DRAMsim: a memory system simulator

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Design tradeoffs for tiled CMP on-chip networks

Proceedings of the 20th annual international conference on Supercomputing
Comparing memory systems for chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Scientific computing Kernels on the cell processor

International Journal of Parallel Programming
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
3D finite difference computation on GPUs using CUDA

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Accelerating seismic computations using customized number representations on FPGAs

EURASIP Journal on Embedded Systems - FPGA supercomputing platforms, architectures, and techniques for accelerating computationally complex algorithms
Analysis of photonic networks for a chip multiprocessor using scientific applications

NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors

SIAM Review
Millisecond-scale molecular dynamics simulations on Anton

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Energy-Efficient Computing for Extreme-Scale Science

Computer
Silicon Nanophotonic Network-on-Chip Using TDM Arbitration

HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects

High throughput software for direct numerical simulations of compressible two-phase flows

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Toward codesign in high performance computing systems

Proceedings of the International Conference on Computer-Aided Design
Exploring power behaviors and trade-offs of in-situ data analytics

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reverse Time Migration (RTM) has become the standard for high-quality imaging in the seismic industry. RTM relies on PDE solutions using stencils that are 8th order or larger, which require large-scale HPC clusters to meet the computational demands. However, the rising power consumption of conventional cluster technology has prompted investigation of architectural alternatives that offer higher computational efficiency. In this work, we compare the performance and energy efficiency of three architectural alternatives -- the Intel Nehalem X5530 multicore processor, the NVIDIA Tesla C2050 GPU, and a general-purpose manycore chip design optimized for high-order wave equations called "Green Wave." We have developed an FPGA-accelerated architectural simulation platform to accurately model the power and performance of the Green Wave design. Results show that across highly-tuned high-order RTM stencils, the Green Wave implementation can offer up to 8x and 3.5x energy efficiency improvement per node respectively, compared with the Nehalem and GPU platforms. These results point to the enormous potential energy advantages of our hardware/software co-design methodology.