Proceedings of the 14th international conference on Supercomputing
Scalability Analysis of Multidimensional Wavefront Algorithms on Large-Scale SMP Clusters
FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
A Performance and Scalability Analysis of the BlueGene/L Architecture
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
International Journal of High Performance Computing Applications
A General Performance Model of Structured and Unstructured Mesh Particle Transport Computations
The Journal of Supercomputing
Gpu gems 3
Entering the petaflop era: the architecture and performance of Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Benchmarking GPUs to tune dense linear algebra
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method
Journal of Computational Physics
High performance radiation transport simulations: preparing for Titan
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
An investigation of the performance portability of OpenCL
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
As a powerful and flexible processor, the Graphic Processing Unit (GPU) can offer great faculty in solving many high-performance computing applications Sweep3D, which simulates a single group time-independent discrete ordinates (Sn) neutron transport deterministically on 3D Cartesian geometry space, represents the key part of a real ASCI application The wavefront process for parallel computation in Sweep3D limits the concurrent threads on the GPU In this paper, we present multi-dimensional optimization methods for Sweep3D, which can be efficiently implemented on the fine grained parallel architecture of the GPU Our results show that the performance of overall Sweep3D on CPU-GPU hybrid platform can be improved up to 2.25 times as compared to the CPU-based implementation.