Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems

Authors:
Leonid Oliker;Xiaoye S. Li;Gerd Heber;Rupak Biswas
Affiliations:
-;-;-;-
Venue:
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Year:
2000

Citing 5
Cited 4

Renumbering unstructured grids to improve the performance of codes on hierarchical memory machines

Advances in Engineering Software
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Self-Avoiding Walks over Adaptive Unstructured Grids

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator

FCRC '96/WACG '96 Selected papers from the Workshop on Applied Computational Geormetry, Towards Geometric Engineering
Reducing the bandwidth of sparse symmetric matrices

ACM '69 Proceedings of the 1969 24th national conference

A comparison of three programming models for adaptive applications on the Origin2000

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A comparison of three programming models for adaptive applications on the origin2000

Journal of Parallel and Distributed Computing
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications
Edgepack: a parallel vertex and node reordering package for optimizing edge-based computations in unstructured grids

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computer sim ulationsof realistic applications usually require solving a set of non-linear partial differential equations (PDEs) over a finite region. The process of obtaining numerical solutions to the governing PDEs involves solving large sparse linear or eigen systems over the unstructured meshes that model the underlying physical objects. These systems are often solved iterativ ely, where the sparse matrix-vector multiply (SPMV) is the most expensive operation within each iteration. In this paper, we focus on the efficiency of SPMV using various ordering/partitioning algorithms. We examine different implementations using three leading programming paradigms and architectures. Results show that ordering greatly improves performance, and that cache reuse can be more important than reducing communication. However, a multithreaded implementation indicates that ordering and partitioning are not required on the Tera MTA to obtain an efficient and scalable SPMV.