A comparison of three programming models for adaptive applications on the origin2000

Authors:
Hongzhang Shan;Jaswinder Pal Singh;Leonid Oliker;Rupak Biswas
Affiliations:
Department of Computer Science, Princeton University, 35 Olden Street, Princeton, New Jersey;Department of Computer Science, Princeton University, 35 Olden Street, Princeton, New Jersey;National Energy Research Scientific Computing Center, Mail Stop 50F, Lawrence Berkeley National Laboratory, Berkeley, California;NASA Advanced Supercomputing Division, Mail Stop T27A-1, NASA Ames Research Center, Moffett Field, California
Venue:
Journal of Parallel and Distributed Computing
Year:
2002

Citing 17
Cited 9

Performance of dynamic load balancing algorithms for unstructured mesh calculations

Concurrency: Practice and Experience
Parallel hierarchical N-body methods

Parallel hierarchical N-body methods
A new procedure for dynamic adaption of three-dimensional unstructured grids

Applied Numerical Mathematics
Experiences with parallel N-body simulation

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Parallel Visualization Algorithms: Performance and Architectural Implications

Computer
Implications of hierarchical N-body methods for multiprocessor architectures

ACM Transactions on Computer Systems (TOCS)
Load balancing and data locality in adaptive hierarchical N-body methods: Barnes-Hut, fast multipole, and radiosity

Journal of Parallel and Distributed Computing
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A performance study of cosmological simulations on message-passing and shared-memory multiprocessors

ICS '96 Proceedings of the 10th international conference on Supercomputing
PLUM: parallel load balancing for adaptive unstructured meshes

Journal of Parallel and Distributed Computing
Parallel Multilevel series k-Way Partitioning Scheme for Irregular Graphs

SIAM Review
Parallel sorting on cache-coherent DSM multiprocessors

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallelization of a Dynamic Unstructured Algorithm Using Three Leading Programming Paradigms

IEEE Transactions on Parallel and Distributed Systems
Parallel tetrahedral mesh adaptation with dynamic load balancing

Parallel Computing - Special issue on graph partioning and parallel computing
A Comparison of MPI, SHMEM and Cache-Coherent Shared Address Space Programming Models on a Tightly-Coupled Multiprocessors

International Journal of Parallel Programming
Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems

Message passing and shared address space parallelism on an SMP cluster

Parallel Computing
Unstructured adaptive meshes: bad for your memory?

Applied Numerical Mathematics - Adaptive methods for partial differential equations and large-scale computation
Performance of a new CFD flow solver using a hybrid programming paradigm

Journal of Parallel and Distributed Computing
De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers

International Journal of High Performance Computing Applications
Exploiting Fine-Grained Parallelism in the Phylogenetic Likelihood Function with MPI, Pthreads, and OpenMP: A Performance Study

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Unstructured adaptive meshes: bad for your memory?

Applied Numerical Mathematics - Adaptive methods for partial differential equations and large-scale computation
Experiences using OpenMP based on compiler directed software DSM on a PC cluster

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Designing an efficient partitioning algorithm for grid environments with application to N-body problems

ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartII
What multilevel parallel programs do when you are not watching: a performance analysis case study comparing MPI/OpenMP, MLP, and nested OpenMP

WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP

Quantified Score

Hi-index	0.02

Visualization

Abstract

Adaptive applications have computational workloads and communication patterns that change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications is therefore a challenging task. In this paper, we compare the performance of and the programming effort required for two major classes of adaptive applications under three leading parallel programming models on an SGI Origin2000 system, a machine that supports all three models efficiently. Results indicate that the three models deliver comparable performance; however, the implementations differ significantly beyond merely using explicit messages versus implicit loads/stores even though the basic parallel algorithms are similar. Compared with the message-passing (using MPI) and SHMEM programming models, the cache-coherent shared address space (CC-SAS) model provides substantial ease of programming at both the conceptual and program orchestration levels, often accompanied by performance gains. However, CC-SAS currently has portability limitations and may suffer from poor spatial locality of physically distributed shared data on large numbers of processors.