Task allocation onto a hypercube by recursive mincut bipartitioning
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
A parallel hashed Oct-Tree N-body algorithm
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
Fully threaded tree algorithms for adaptive refinement fluid dynamics simulations
Journal of Computational Physics
An Application-Centric Characterization of Domain-Based SFC Partitioners for Parallel SAMR
IEEE Transactions on Parallel and Distributed Systems
HPCN Europe 1996 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Alternative Algorithm for Hilbert's Space-Filling Curve
IEEE Transactions on Computers
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Topology-aware task mapping for reducing communication contention on large parallel machines
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Generic topology mapping strategies for large-scale parallel architectures
Proceedings of the international conference on Supercomputing
Automating topology aware mapping for supercomputers
Automating topology aware mapping for supercomputers
Scalable Communication-Aware Task Mapping Algorithms for Interconnected Multicore Systems
HPCC '11 Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications
Hierarchical Mapping for HPC Applications
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Performance Emulation of Cell-Based AMR Cosmology Simulations
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Improving Parallel IO Performance of Cell-based AMR Cosmology Applications
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
2HOT: an improved parallel hashed oct-tree n-body algorithm for cosmological simulation
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Cosmology simulations are highly communication-intensive, thus it is critical to exploit topology-aware task mapping techniques for performance optimization. To exploit the architectural properties of multiprocessor clusters (the performance gap between inter-node and intra-node communication as well as the gap between inter-socket and intra-socket communication), we design and develop a hierarchical task mapping scheme for cell-based AMR (Adaptive Mesh Refinement) cosmology simulations, in particular, the ART application. Our scheme consists of two parts: (1) an inter-node mapping to map application processes onto nodes with the objective of minimizing network traffic among nodes and (2) an intra-node mapping within each node to minimize the maximum size of messages transmitted between CPU sockets. Experiments on production supercomputers with 3D torus and fat-tree topologies show that our scheme can significantly reduce application communication cost by up to 50%. More importantly, our scheme is generic and can be extended to many other applications.