Task allocation onto a hypercube by recursive mincut bipartitioning
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
How Good is Recursive Bisection?
SIAM Journal on Scientific Computing
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
Introduction to algorithms
Parallel netCDF: A High-Performance Scientific I/O Interface
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
MPI performance analysis tools on Blue Gene/L
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Overview of the IBM Blue Gene/P project
IBM Journal of Research and Development
Architecture Aware Partitioning Algorithms
ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
Overview of the Blue Gene/L system architecture
IBM Journal of Research and Development
Generic topology mapping strategies for large-scale parallel architectures
Proceedings of the international conference on Supercomputing
Avoiding hot-spots on two-level direct networks
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Accurate and timely prediction of weather phenomena, such as hurricanes and flash floods, require high-fidelity compute intensive simulations of multiple finer regions of interest within a coarse simulation domain. Current weather applications execute these nested simulations sequentially using all the available processors, which is sub-optimal due to their sub-linear scalability. In this work, we present a strategy for parallel execution of multiple nested domain simulations based on partitioning the 2-D processor grid into disjoint rectangular regions associated with each domain. We propose a novel combination of performance prediction, processor allocation methods and topology-aware mapping of the regions on torus interconnects. Experiments on IBM Blue Gene systems using WRF show that the proposed strategies result in performance improvement of up to 33% with topology-oblivious mapping and up to additional 7% with topology-aware mapping over the default sequential strategy.