Renumbering unstructured grids to improve the performance of codes on hierarchical memory machines
Advances in Engineering Software
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
Self-Avoiding Walks over Adaptive Unstructured Grids
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator
FCRC '96/WACG '96 Selected papers from the Workshop on Applied Computational Geormetry, Towards Geometric Engineering
Reducing the bandwidth of sparse symmetric matrices
ACM '69 Proceedings of the 1969 24th national conference
A comparison of three programming models for adaptive applications on the Origin2000
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A comparison of three programming models for adaptive applications on the origin2000
Journal of Parallel and Distributed Computing
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Hi-index | 0.00 |
Computer sim ulationsof realistic applications usually require solving a set of non-linear partial differential equations (PDEs) over a finite region. The process of obtaining numerical solutions to the governing PDEs involves solving large sparse linear or eigen systems over the unstructured meshes that model the underlying physical objects. These systems are often solved iterativ ely, where the sparse matrix-vector multiply (SPMV) is the most expensive operation within each iteration. In this paper, we focus on the efficiency of SPMV using various ordering/partitioning algorithms. We examine different implementations using three leading programming paradigms and architectures. Results show that ordering greatly improves performance, and that cache reuse can be more important than reducing communication. However, a multithreaded implementation indicates that ordering and partitioning are not required on the Tera MTA to obtain an efficient and scalable SPMV.