Fast sparse matrix-vector multiplication for TeraFlop/s computers

Authors:
Gerhard Wellein;Georg Hager;Achim Basermann;Holger Fehske
Affiliations:
Regionales Rechenzentrum Erlangen, Erlangen, Germany;Regionales Rechenzentrum Erlangen, Erlangen, Germany;C&C Research Laboratories, NEC Europe Ltd, Sankt Augustin, Germany;Institut für Physik, Universität Greifswald, Greifswald, Germany
Venue:
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Year:
2002

Citing 5
Cited 4

Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A comparison of three programming models for adaptive applications on the Origin2000

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Performance modeling and tuning of an unstructured mesh CFD application

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Dual-Level Parallel Analysis of Harbor Wave Response Using MPI and OpenMP

International Journal of High Performance Computing Applications
Communication Bandwidth of Parallel Programming Models on Hybrid Architectures

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing

Communication and Optimization Aspects on Hybrid Architectures

Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Approaches Based on Permutations for Partitioning Sparse Matrices on Multiprocessors

The Journal of Supercomputing
Hierarchical hybrid grids: achieving TERAFLOP performance on large scale finite element simulations

International Journal of Parallel, Emergent and Distributed Systems
Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs

Parallel Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Eigenvalue problems involving very large sparse matrices are common to various fields in science. In general, the numerical core of iterative eigenvalue algorithms is a matrix-vector multiplication (MVM) involving the large sparse matrix. We present three different programming approaches for parallel MVM on present day supercomputers. In addition to a pure message-passing approach, two hybrid parallel implementations are introduced based on simultaneous use of message-passing and shared-memory programming models. For a modern SMP cluster (HITACHI SR8000) performance and scalability of the hybrid implementations are discussed and compared with the pure message-passing approach on massively-parallel systems (CRAY T3E), vector computers (NEC SX5e) and distributed shared-memory systems (SGI Origin3800).