Dynamic software testing of MPI applications with umpire
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Fast, Centralized Detection and Resolution of Distributed Deadlocks in the Generalized Model
IEEE Transactions on Software Engineering
Toward Scalable Performance Visualization with Jumpshot
International Journal of High Performance Computing Applications
Concurrent deadlock detection in parallel programs
International Journal of Computers and Applications
PNMPI tools: a whole lot greater than the sum of their parts
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
FlowChecker: Detecting Bugs in MPI Libraries via Message Flow Checking
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Precise dynamic analysis for slack elasticity: adding buffering without adding bugs
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
GRace: a low-overhead mechanism for detecting data races in GPU programs
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Verification and coverage of message passing multicore applications
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on verification challenges in the concurrent world
Probabilistic diagnosis of performance faults in large-scale parallel applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
MPI runtime error detection with MUST: advances in deadlock detection
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
UPC-CHECK: a scalable tool for detecting run-time errors in Unified Parallel C
Computer Science - Research and Development
Distributed wait state tracking for runtime MPI deadlock detection
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
MPI runtime error detection with MUST: Advances in deadlock detection
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
The MPI standard defines several usage patterns that can lead to deadlock, some of which involve collective communications or non-deterministic operations such as wildcard receives. Further, some MPI programming deadlocks only occur for some MPI implementations or certain configurations. Many tools to detect MPI deadlocks exist; however, none precisely handles the increased complexity of deadlock detection created by the richness of the MPI standard, which requires a general deadlock model. We present the first general deadlock model for MPI including a novel necessary and sufficient criterion, the OR-Knot, for deadlock in MPI programs. This model enables visualization of MPI deadlocks and motivates the design of a new deadlock detection mechanism. We compare our implementation of this mechanism to the ad-hoc mechanism previously available in Umpire, which reflected MPI non-determinism and, thus, more completely detected MPI deadlocks than any other existing MPI deadlock detection tool. Overall, our results demonstrate that our mechanism improves performance by as much as two orders of magnitude while providing precise characterization of deadlocks.