Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

Authors:
Leonid Oliker;Xiaoye Li;Parry Husbands;Rupak Biswas
Affiliations:
-;-;-;-
Venue:
SIAM Review
Year:
2002

Citing 0
Cited 19

Message passing and shared address space parallelism on an SMP cluster

Parallel Computing
Parallel Iterative Solvers of GeoFEM with Selective Blocking Preconditioning for Nonlinear Contact Problems on the Earth Simulator

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance of a new CFD flow solver using a hybrid programming paradigm

Journal of Parallel and Distributed Computing
Cache-oblivious mesh layouts

ACM SIGGRAPH 2005 Papers
Three-level hybrid vs. flat MPI on the Earth Simulator: parallel iterative solvers for finite-element method

Applied Numerical Mathematics - 6th IMACS International symposium on iterative methods in scientific computing
Parallel iterative solvers for finite-element methods using an OpenMP/MPI hybrid programming model on the Earth Simulator

Parallel Computing - OpenMp
Performance Modeling of Communication and Computation in Hybrid MPI and OpenMP Applications

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
Special Section: Parallel Graphics and Visualization: Parallel techniques for physically based simulation on multi-core processor architectures

Computers and Graphics
Evaluation of Hierarchical Mesh Reorderings

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Three-level hybrid vs. flat MPI on the Earth Simulator: Parallel iterative solvers for finite-element method

Applied Numerical Mathematics - 6th IMACS International symposium on iterative methods in scientific computing
Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs

Parallel Computing
Edgepack: a parallel vertex and node reordering package for optimizing edge-based computations in unstructured grids

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Increasing the Locality of Iterative Methods and Its Application to the Simulation of Semiconductor Devices

International Journal of High Performance Computing Applications
Parallel finite element simulations of incompressible viscous fluid flow by domain decomposition with Lagrange multipliers

Journal of Computational Physics
Adjacency-based data reordering algorithm for acceleration of finite element computations

Scientific Programming
Hierarchical Diagonal Blocking and Precision Reduction Applied to Combinatorial Multigrid

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs

Microprocessors & Microsystems
Exploiting parallelism in physically-based simulations on multi-core processor architectures

EG PGV'07 Proceedings of the 7th Eurographics conference on Parallel Graphics and Visualization
Sparse matrix-vector multiplication on the Single-Chip Cloud Computer many-core processor

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

The conjugate gradient (CG) algorithm is perhaps the best-known iterative technique for solving sparse linear systems that are symmetric and positive definite. For systems that are ill conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(0) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance on both distributed and distributed shared-memory systems, cache reuse may be more important than reducing communication, it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and a hybrid MPI + OpenMP paradigm increases programming complexity with little performance gain. A multithreaded implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread-level parallelism.