Performance of dynamic load balancing algorithms for unstructured mesh calculations
Concurrency: Practice and Experience
Parallel hierarchical N-body methods
Parallel hierarchical N-body methods
Experiences with parallel N-body simulation
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Implications of hierarchical N-body methods for multiprocessor architectures
ACM Transactions on Computer Systems (TOCS)
Journal of Parallel and Distributed Computing
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A performance study of cosmological simulations on message-passing and shared-memory multiprocessors
ICS '96 Proceedings of the 10th international conference on Supercomputing
PLUM: parallel load balancing for adaptive unstructured meshes
Journal of Parallel and Distributed Computing
ICS '99 Proceedings of the 13th international conference on Supercomputing
Parallelization of a dynamic unstructured application using three leading paradigms
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallel sorting on cache-coherent DSM multiprocessors
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallel tetrahedral mesh adaptation with dynamic load balancing
Parallel Computing - Special issue on graph partioning and parallel computing
Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
The trade-off between implicit and explicit data distribution in shared-memory programming paradigms
ICS '01 Proceedings of the 15th international conference on Supercomputing
Parallel Processing of Adaptive Meshes with Load Balancing
IEEE Transactions on Parallel and Distributed Systems
Scaling irregular parallel codes with minimal programming effort
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models
International Journal of Parallel Programming
Evaluating the XMT Parallel Programming Model
HIPS '01 Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Faster Collective Output through Active Buffering
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Message Passing Vs. Shared Address Space on a Clusters of SMPs
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Dual-level parallelism for deterministic and stochastic CFD problems
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Distributed dynamic hash tables using IBM LAPI
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Performance characteristics of the Cray X1 and their implications for application performance tuning
Proceedings of the 18th annual international conference on Supercomputing
High-Level Buffering for Hiding Periodic Output Cost in Scientific Simulations
IEEE Transactions on Parallel and Distributed Systems
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit
International Journal of High Performance Computing Applications
Fast sparse matrix-vector multiplication for TeraFlop/s computers
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Hi-index | 0.00 |
Adaptive applications have computational workloads and communication patterns which change unpredictably at runtime, requiring load balancing to achieve scalable performance on parallel machines. Efficient parallel implementation of such adaptive application is therefore a challenging task. In this paper, we compare the performance of and the programming effort required for two major classes of adaptive applications under three leading parallel programming models on an SGI Origin 2000 system, a machine which supports all three models efficiently. Results indicate that the three models deliver comparable performance. However, the implementations differ significantly beyond merely using explicit messages versus implicit loads/stores even though the basic parallel algorithms are similar. Compared with the message-passing (using MPI) and SHMEM programming models, the cache-coherent shared address space (CC-SAS) model provides substantial ease of programming at both the conceptual level and program orchestration levels, often accompanied by performance gains. However, CC-SAS currently has portability limitations and may suffer from poor spatial locality of physically distributed shared data on large numbers of processors.