A comparison of three programming models for adaptive applications on the Origin2000

Authors:
Hongzhang Shan;Jaswinder P. Singh;Leonid Oliker;Rupak Biswas
Affiliations:
Department of Computer Science, 35 Olden Street, Princeton University, Princeton, NJ;Department of Computer Science, 35 Olden Street, Princeton University, Princeton, NJ;National Energy Research Scientific Computing Center, Mail Stop 50F, Lawrence Berkeley National Laboratory, Berkeley, CA;Computer Sciences Corporation, Mail Stop T27A-1, NASA Ames Research Center, Moffett Field, CA
Venue:
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Year:
2000

Citing 16
Cited 15

Performance of dynamic load balancing algorithms for unstructured mesh calculations

Concurrency: Practice and Experience
Parallel hierarchical N-body methods

Parallel hierarchical N-body methods
Experiences with parallel N-body simulation

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Parallel Visualization Algorithms: Performance and Architectural Implications

Computer
Implications of hierarchical N-body methods for multiprocessor architectures

ACM Transactions on Computer Systems (TOCS)
Load balancing and data locality in adaptive hierarchical N-body methods: Barnes-Hut, fast multipole, and radiosity

Journal of Parallel and Distributed Computing
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A performance study of cosmological simulations on message-passing and shared-memory multiprocessors

ICS '96 Proceedings of the 10th international conference on Supercomputing
PLUM: parallel load balancing for adaptive unstructured meshes

Journal of Parallel and Distributed Computing
A comparison of MPI, SHMEM and cache-coherent shared address space programming models on the SGI Origin2000

ICS '99 Proceedings of the 13th international conference on Supercomputing
Parallel Multilevel series k-Way Partitioning Scheme for Irregular Graphs

SIAM Review
Parallelization of a dynamic unstructured application using three leading paradigms

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallel sorting on cache-coherent DSM multiprocessors

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallel tetrahedral mesh adaptation with dynamic load balancing

Parallel Computing - Special issue on graph partioning and parallel computing
Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems

The trade-off between implicit and explicit data distribution in shared-memory programming paradigms

ICS '01 Proceedings of the 15th international conference on Supercomputing
Parallel Processing of Adaptive Meshes with Load Balancing

IEEE Transactions on Parallel and Distributed Systems
Scaling irregular parallel codes with minimal programming effort

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models

International Journal of Parallel Programming
Evaluating the XMT Parallel Programming Model

HIPS '01 Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments
Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Faster Collective Output through Active Buffering

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Message Passing Vs. Shared Address Space on a Clusters of SMPs

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Dual-level parallelism for deterministic and stochastic CFD problems

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Distributed dynamic hash tables using IBM LAPI

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Communication characteristics of large-scale scientific applications for contemporary cluster architectures

Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Performance characteristics of the Cray X1 and their implications for application performance tuning

Proceedings of the 18th annual international conference on Supercomputing
High-Level Buffering for Hiding Periodic Output Cost in Scientific Simulations

IEEE Transactions on Parallel and Distributed Systems
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit

International Journal of High Performance Computing Applications
Fast sparse matrix-vector multiplication for TeraFlop/s computers

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Adaptive applications have computational workloads and communication patterns which change unpredictably at runtime, requiring load balancing to achieve scalable performance on parallel machines. Efficient parallel implementation of such adaptive application is therefore a challenging task. In this paper, we compare the performance of and the programming effort required for two major classes of adaptive applications under three leading parallel programming models on an SGI Origin 2000 system, a machine which supports all three models efficiently. Results indicate that the three models deliver comparable performance. However, the implementations differ significantly beyond merely using explicit messages versus implicit loads/stores even though the basic parallel algorithms are similar. Compared with the message-passing (using MPI) and SHMEM programming models, the cache-coherent shared address space (CC-SAS) model provides substantial ease of programming at both the conceptual level and program orchestration levels, often accompanied by performance gains. However, CC-SAS currently has portability limitations and may suffer from poor spatial locality of physically distributed shared data on large numbers of processors.