Optimizing sweep3d for graphic processor unit

Authors:
Chunye Gong;Jie Liu;Zhenghu Gong;Jin Qin;Jing Xie
Affiliations:
Department of Computer Sciences, National University of Defense Technology, Changsha, China;Department of Computer Sciences, National University of Defense Technology, Changsha, China;Department of Computer Sciences, National University of Defense Technology, Changsha, China;Department of Computer Sciences, National University of Defense Technology, Changsha, China;Department of Computer Sciences, National University of Defense Technology, Changsha, China
Venue:
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Year:
2010

Citing 8
Cited 5

A general performance model for parallel sweeps on orthogonal grids for particle transport calculations

Proceedings of the 14th international conference on Supercomputing
Scalability Analysis of Multidimensional Wavefront Algorithms on Large-Scale SMP Clusters

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
A Performance and Scalability Analysis of the BlueGene/L Architecture

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications

International Journal of High Performance Computing Applications
A General Performance Model of Structured and Unstructured Mesh Particle Transport Computations

The Journal of Supercomputing
Gpu gems 3

Gpu gems 3
Entering the petaflop era: the architecture and performance of Roadrunner

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing

Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method

Journal of Computational Physics
Adapting wave-front algorithms to efficiently utilize systems with deep communication hierarchies

Parallel Computing
High performance radiation transport simulations: preparing for Titan

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
An investigation of the performance portability of OpenCL

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As a powerful and flexible processor, the Graphic Processing Unit (GPU) can offer great faculty in solving many high-performance computing applications Sweep3D, which simulates a single group time-independent discrete ordinates (Sn) neutron transport deterministically on 3D Cartesian geometry space, represents the key part of a real ASCI application The wavefront process for parallel computation in Sweep3D limits the concurrent threads on the GPU In this paper, we present multi-dimensional optimization methods for Sweep3D, which can be efficiently implemented on the fine grained parallel architecture of the GPU Our results show that the performance of overall Sweep3D on CPU-GPU hybrid platform can be improved up to 2.25 times as compared to the CPU-based implementation.