Achieving 60 GFLOP/s on the production CFD code OverFLow-MLP
Parallel Computing - Special issue on parallel computing in aerospace
An Analysis of Performance Enhancement Techniques for Overset Grid Applications
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Benchmarking the Columbia Supercluster
International Journal of High Performance Computing Applications
Data and thread affinity in openmp programs
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A dynamic scheduler for balancing HPC applications
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Performance Evaluation of a Multi-zone Application in Different OpenMP Approaches
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Performance evaluation of a multi-zone application in different OpenMP approaches
International Journal of Parallel Programming
MPI correctness checking for OpenMP/MPI applications
International Journal of Parallel Programming
Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism
Performance analysis of HPC applications in the cloud
Future Generation Computer Systems
A new approach for performance analysis of openMP programs
Proceedings of the 27th international ACM conference on International conference on supercomputing
MuMMI: multiple metrics modeling infrastructure for exploring performance and power modeling
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Hi-index | 0.00 |
We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of meshes, but had not previously been captured in benchmarks. The new suite, named NPB (NAS parallel benchmarks) multi-zone, is derived from the NPB suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the message passing interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on four different parallel computers. We also use an empirical formula to investigate the performance characteristics of the hybrid parallel codes.