Dynamic software testing of MPI applications with umpire
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Runtime Checking of Datatype Signatures in MPI
Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
ISP: a tool for model checking MPI programs
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A Scalable Tools Communications Infrastructure
HPCS '08 Proceedings of the 2008 22nd International Symposium on High Performance Computing Systems and Applications
SPEC MPI2007—an application benchmark suite for parallel systems using MPI
Concurrency and Computation: Practice & Experience - International Supercomputing Conference (ISC07)
Order preserving event aggregation in TBONs
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Collective error detection for MPI collective operations
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Holistic Debugging of MPI Derived Datatypes
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
GTI: A Generic Tools Infrastructure for Event-Based Tools in Parallel Systems
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
MPI runtime error detection with MUST: advances in deadlock detection
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Runtime error detection tools detect many classes of MPI usage errors, including errors in collective communication calls. However, they often face scalability challenges. We present runtime checks for MPI collective operations that use a Tree-Based Overlay Network (TBON) for scalability and that provide full datatype matching. While we can use transitive correctness properties for most checks, some collective operations impose non-transitive correctness properties, e.g., MPI_Alltoallv, where we use an intralayer communication within the TBON to distribute datatype matching information. An overhead study with stress tests and two benchmark suites demonstrates applicability and scalability at 4,096, 2,048 and 16,384 processes respectively.