Performance of dynamic load balancing algorithms for unstructured mesh calculations
Concurrency: Practice and Experience
Parallel hierarchical N-body methods
Parallel hierarchical N-body methods
A new procedure for dynamic adaption of three-dimensional unstructured grids
Applied Numerical Mathematics
Experiences with parallel N-body simulation
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Implications of hierarchical N-body methods for multiprocessor architectures
ACM Transactions on Computer Systems (TOCS)
Journal of Parallel and Distributed Computing
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A performance study of cosmological simulations on message-passing and shared-memory multiprocessors
ICS '96 Proceedings of the 10th international conference on Supercomputing
PLUM: parallel load balancing for adaptive unstructured meshes
Journal of Parallel and Distributed Computing
Parallel sorting on cache-coherent DSM multiprocessors
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallelization of a Dynamic Unstructured Algorithm Using Three Leading Programming Paradigms
IEEE Transactions on Parallel and Distributed Systems
Parallel tetrahedral mesh adaptation with dynamic load balancing
Parallel Computing - Special issue on graph partioning and parallel computing
International Journal of Parallel Programming
Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
Message passing and shared address space parallelism on an SMP cluster
Parallel Computing
Unstructured adaptive meshes: bad for your memory?
Applied Numerical Mathematics - Adaptive methods for partial differential equations and large-scale computation
Performance of a new CFD flow solver using a hybrid programming paradigm
Journal of Parallel and Distributed Computing
De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers
International Journal of High Performance Computing Applications
PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Unstructured adaptive meshes: bad for your memory?
Applied Numerical Mathematics - Adaptive methods for partial differential equations and large-scale computation
Experiences using OpenMP based on compiler directed software DSM on a PC cluster
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartII
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Hi-index | 0.02 |
Adaptive applications have computational workloads and communication patterns that change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications is therefore a challenging task. In this paper, we compare the performance of and the programming effort required for two major classes of adaptive applications under three leading parallel programming models on an SGI Origin2000 system, a machine that supports all three models efficiently. Results indicate that the three models deliver comparable performance; however, the implementations differ significantly beyond merely using explicit messages versus implicit loads/stores even though the basic parallel algorithms are similar. Compared with the message-passing (using MPI) and SHMEM programming models, the cache-coherent shared address space (CC-SAS) model provides substantial ease of programming at both the conceptual and program orchestration levels, often accompanied by performance gains. However, CC-SAS currently has portability limitations and may suffer from poor spatial locality of physically distributed shared data on large numbers of processors.