A block-asynchronous relaxation method for graphics processing units

Authors:
Hartwig Anzt;Stanimire Tomov;Jack Dongarra;Vincent Heuveline
Affiliations:
-;-;-;-
Venue:
Journal of Parallel and Distributed Computing
Year:
2013

Citing 11
Cited 1

Generalized asynchronous iterations

Proc. of the conference on algorithms and hardware for parallel processing on CONPAR 86
A unified proof for the convergence of Jacobi and Gauss-Seidel methods

SIAM Review
On asynchronous iterations

Journal of Computational and Applied Mathematics - Special issue on numerical analysis 2000 Vol. III: linear algebra
A Parallel Krylov-Type Method for Nonsymmetric Linear Systems

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
Parallel Scientific Computing in C++ and MPI

Parallel Scientific Computing in C++ and MPI
Architectural core salvaging in a multi-core processor for hard-error tolerance

Proceedings of the 36th annual international symposium on Computer architecture
Toward Exascale Resilience

International Journal of High Performance Computing Applications
The International Exascale Software Project roadmap

International Journal of High Performance Computing Applications
Cooperative Application/OS DRAM fault recovery

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
A Block-Asynchronous Relaxation Method for Graphics Processing Units

IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

Self-stabilizing iterative solvers

ScalA '13 Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Units (GPUs). We develop asynchronous iteration algorithms in CUDA and compare them with parallel implementations of synchronous relaxation methods on CPU- or GPU-based systems. For a set of test matrices from UFMC we investigate convergence behavior, performance and tolerance to hardware failure. We observe that even for our most basic asynchronous relaxation scheme, the method can efficiently leverage the GPUs computing power and is, despite its lower convergence rate compared to the Gauss-Seidel relaxation, still able to provide solution approximations of certain accuracy in considerably shorter time than Gauss-Seidel running on CPUs- or GPU-based Jacobi. Hence, it overcompensates for the slower convergence by exploiting the scalability and the good fit of the asynchronous schemes for the highly parallel GPU architectures. Further, enhancing the most basic asynchronous approach with hybrid schemes-using multiple iterations within the ''subdomain'' handled by a GPU thread block-we manage to not only recover the loss of global convergence but often accelerate convergence of up to two times, while keeping the execution time of a global iteration practically the same. The combination with the advantageous properties of asynchronous iteration methods with respect to hardware failure identifies the high potential of the asynchronous methods for Exascale computing.