Improved parallel I/O via a two-phase run-time access strategy
ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Implementing the MPI process topology mechanism
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Proceedings of the 20th annual international conference on Supercomputing
Topology mapping for Blue Gene/L supercomputer
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Dynamic topology aware load balancing algorithms for molecular dynamics applications
Proceedings of the 23rd international conference on Supercomputing
Performance Evaluation of Collective Write Algorithms in MPI I/O
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Near-optimal placement of MPI processes on hierarchical NUMA architectures
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Generic topology mapping strategies for large-scale parallel architectures
Proceedings of the international conference on Supercomputing
OMPIO: a modular software architecture for MPI I/O
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Hi-index | 0.00 |
Mapping of MPI processes to the available resources is an increasingly complex but important task on modern parallel systems. This paper presents a new approach to optimize the process placement of a parallel application based on its I/O access pattern. The paper introduces the SetMatch process mapping algorithm, which significantly reduces the cost of the communication occurring in collective I/O operations. The effectiveness of the approach has been evaluated for multiple scenarios on a PVFS2 file system. Our results demonstrate significant improvements in the communication time of collective I/O operations as well as improvements in the overall application execution time with our mapping strategy. The generalized SetMatch algorithm was the only mapping strategy that was able to provide adequate performance for all scenarios used in this paper.