MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Hardware- and Software-Based Collective Communication on the Quadrics Network
NCA '01 Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA'01)
Pipelining and Overlapping for MPI Collective Operations
LCN '03 Proceedings of the 28th Annual IEEE International Conference on Local Computer Networks
Efficient implementation of reduce-scatter in MPI
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Performance Analysis of MPI Collective Operations
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Statistical Models for Empirical Search-Based Performance Tuning
International Journal of High Performance Computing Applications
On optimizing collective communication
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
The component structure of a self-adapting numerical software system
International Journal of Parallel Programming - Special issue: The next generation software program
Design and implementation of message-passing services for the Blue Gene/L supercomputer
IBM Journal of Research and Development
MPI Reduction Operations for Sparse Floating-point Data
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Expert Systems with Applications: An International Journal
International Journal of Knowledge Engineering and Soft Data Paradigms
Optimization principles for collective neighborhood communications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
We explore the applicability of the quadtree encoding method to the run-time MPI collective algorithm selection problem. Measured algorithm performance data was used to construct quadtrees with different properties. The quality and performance of generated decision functions and in-memory decision systems were evaluated. Experimental data shows that in some cases, a decision function based on a quadtree structure with a mean depth of three, incurs on average as little as a 5% performance penalty. In all cases, experimental data can be fully represented with a quadtree containing a maximum of six levels. Our results indicate that quadtrees may be a feasible choice for both processing of the performance data and automatic decision function generation.