Simulated annealing: theory and applications
Simulated annealing: theory and applications
Task allocation onto a hypercube by recursive mincut bipartitioning
Journal of Parallel and Distributed Computing
An efficient algorithm for a task allocation problem
Journal of the ACM (JACM)
Rectilinear partitioning of irregular data parallel computations
Journal of Parallel and Distributed Computing
Predictive performance and scalability modeling of a large-scale application
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Implementing the MPI process topology mechanism
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Topology mapping for Blue Gene/L supercomputer
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Reconfigurable hybrid interconnection for static and dynamic scientific applications
Proceedings of the 4th international conference on Computing frontiers
IBM Journal of Research and Development
Advancing supercomputer performance through interconnection topology synthesis
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Process Mapping for MPI Collective Communications
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
A Case Study of Communication Optimizations on 3D Mesh Interconnects
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Low cost high performance uncertainty quantification
Proceedings of the 2nd Workshop on High Performance Computational Finance
Overview of the Blue Gene/L system architecture
IBM Journal of Research and Development
Topology-aware task mapping for reducing communication contention on large parallel machines
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems
Computer Science - Research and Development
Avoiding hot-spots on two-level direct networks
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Low-cost data uncertainty quantification
Concurrency and Computation: Practice & Experience
Mapping applications with collectives over sub-communicators on torus networks
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Topology configuration in hybrid EPS/OCS interconnects
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Topology aware process mapping
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Hi-index | 0.00 |
A general method for optimizing problem layout on the Blue Gene®/L (BG/L) supercomputer is described. The method takes as input the communication matrix of an arbitrary problem as an array with entries C(i, j), which represents the data communicated from domain i to domain j. Given C(i, j), we implement a heuristic map that attempts to sequentially map a domain and its communication neighbors either to the same BG/L node or to near-neighbor nodes on the BG/L torus, while keeping the number of domains mapped to a BG/L node constant. We then generate a Markov chain of maps using Monte Carlo simulation with free energy F =Σi,j C(i, j)H(i, j), where H(i, j) is the smallest number of hops on the BG/L torus between domain i and domain j. For two large parallel applications, SAGE and UMT2000, the method was tested against the default Message Passing Interface rank order layout on up to 2,048 BG/L nodes. It produced maps that improved communication efficiency by up to 45%.