Parallel and distributed computation: numerical methods
Parallel and distributed computation: numerical methods
Asynchronous Iterative Methods for Multiprocessors
Journal of the ACM (JACM)
Journal of Computational and Applied Mathematics - Special issue on numerical analysis 2000 Vol. III: linear algebra
Numerical Performance of an Asynchronous Jacobi Iteration
CONPAR '92/ VAPP V Proceedings of the Second Joint International Conference on Vector and Parallel Processing: Parallel Processing
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
Coupling Dynamic Load Balancing with Asynchronism in Iterative Algorithms on the Computational Grid
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Performance Comparison of Parallel Programming Environments for Implementing AIAC Algorithms
The Journal of Supercomputing
Topology mapping for Blue Gene/L supercomputer
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Extracting state-based performance metrics using asynchronous iterative techniques
Performance Evaluation
Communication-avoiding krylov subspace methods
Communication-avoiding krylov subspace methods
Introducing OpenSHMEM: SHMEM for the PGAS community
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Hi-index | 0.00 |
Ever-increasing core counts create the need to develop parallel algorithms that avoid closely coupled execution across all cores. We present performance analysis of several parallel asynchronous implementations of Jacobi's method for solving systems of linear equations, using MPI, SHMEM and OpenMP. In particular we have solved systems of over 4 billion unknowns using up to 32,768 processes on a Cray XE6 supercomputer. We show that the precise implementation details of asynchronous algorithms can strongly affect the resulting performance and convergence behaviour of our solvers in unexpected ways, discuss how our specific implementations could be generalised to other classes of problem, and suggest how existing parallel programming models might be extended to allow asynchronous algorithms to be expressed more easily.