Parallelization of the NAS Conjugate Gradient Benchmark Using the Global Arrays Shared Memory Programming Model

Authors:
Yeliang Zhang;Vinod Tipparaju;Jarek Nieplocha;Salim Hariri
Affiliations:
University of Arizona;Pacific Northwest National Laboratory;Pacific Northwest National Laboratory;University of Arizona
Venue:
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 4 - Volume 05
Year:
2005

Citing 10
Cited 3

Some shared memory is desirable in parallel sparse matrix computation

ACM SIGNUM Newsletter
Performance of the NAS parallel benchmarks on PVM-based networks

Journal of Parallel and Distributed Computing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
High performance Fortran compilation techniques for parallelizing scientific codes

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Portable performance of data parallel languages

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Global arrays: a portable "shared-memory" programming model for distributed memory computers

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
UPC performance and potential: a NPB experimental study

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors

Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
The design and implementation of a parallel array operator for the arbitrary remapping of data

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit

International Journal of High Performance Computing Applications

Using the GA and TAO toolkits for solving large-scale optimization problems on parallel computers

ACM Transactions on Mathematical Software (TOMS)
CG-Cell: an NPB benchmark implementation on cell broadband engine

ICDCN'08 Proceedings of the 9th international conference on Distributed computing and networking
Audit: A new synchronization API for the GET/PUT protocol

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The NAS Conjugate Gradient (CG) benchmark is an important scientific kernel used to evaluate machine performance and compare characteristics of different programming models. Global Arrays (GA) toolkit supports a shared memory programming paradigm and offers the programmer control over the distribution and locality that are important for optimizing performance on scalable architectures. In this paper, we describe and compare two different parallelization strategies of the CG benchmark using GA and report performance results on a shared-memory system as well as on a cluster. Performance benefits of using shared memory for irregular/sparse computations have been demonstrated before in the context of the CG benchmark using OpenMP. Similarly, the GA implementation outperforms the standard MPI implementation on shared memory system, in our case the SGI Altix. However, with GA these benefits are extended to distributed memory systems and demonstrated on a Linux cluster with Myrinet.