Dynamic software testing of MPI applications with umpire

Authors:
Jeffrey S. Vetter;Bronis R. de Supinski
Affiliations:
Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, California;Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, California
Venue:
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Year:
2000

Citing 14
Cited 43

High-level debugging in parasight

PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Models for monitoring and debugging tools for parallel and distributed software

Journal of Parallel and Distributed Computing - Special issue: software tools for parallel programming and visualization
Parallel program debugging with on-the-fly anomaly detection

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Detecting access anomalies in programs with critical sections

PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
Performance debugging using parallel performance predicates

PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Compile-time support for efficient data race detection in shared-memory parallel programs

PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Optimal tracing and replay for debugging shared-memory parallel programs

PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
The Ariadne debugger: scalable application of event-based abstraction

PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Debugging heterogeneous distributed systems using event-based models of behavior

ACM Transactions on Computer Systems (TOCS)
Eraser: a dynamic data race detector for multithreaded programs

ACM Transactions on Computer Systems (TOCS)
Using MPI (2nd ed.): portable parallel programming with the message-passing interface

Using MPI (2nd ed.): portable parallel programming with the message-passing interface
MPI: The Complete Reference

MPI: The Complete Reference
Determining Possible Event Orders by Analyzing Sequential Traces

IEEE Transactions on Parallel and Distributed Systems
ATOM: a flexible interface for building high performance program analysis tools

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings

Monitoring and Debugging Parallel Software with BCS-MPI on Large-Scale Clusters

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18 - Volume 19
Productivity in High Performance Computing

International Journal of High Performance Computing Applications
Modeling wildcard-free MPI programs for verification

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Automated, scalable debugging of MPI programs with Intel® Message Checker

Proceedings of the second international workshop on Software engineering for high performance computing system applications
Improving distributed memory applications testing by message perturbation

Proceedings of the 2006 workshop on Parallel and distributed systems: testing and debugging
Concurrent deadlock detection in parallel programs

International Journal of Computers and Applications
Techniques for specifying bug patterns

Proceedings of the 2007 ACM workshop on Parallel and distributed systems: testing and debugging
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
PNMPI tools: a whole lot greater than the sum of their parts

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Dreadlocks: efficient deadlock detection

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Dynamic Verification of MPI Programs with Reductions in Presence of Split Operations and Relaxed Orderings

CAV '08 Proceedings of the 20th international conference on Computer Aided Verification
Implementing Efficient Dynamic Formal Verification Methods for MPI Programs

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A Formal Approach to Detect Functionally Irrelevant Barriers in MPI Programs

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
MPIWiz: subgroup reproducible replay of mpi applications

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Formal verification of practical MPI programs

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A graph based approach for MPI deadlock detection

Proceedings of the 23rd international conference on Supercomputing
Tools for scalable parallel program analysis: Vampir NG, MARMOT, and DeWiz

International Journal of Computational Science and Engineering
A parallel harmonic-balance approach to steady-state and envelope-following simulation of driven and autonomous circuits

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
MPI correctness checking for OpenMP/MPI applications

International Journal of Parallel Programming
Gauss: A Framework for Verifying Scientific Computing Software

Electronic Notes in Theoretical Computer Science (ENTCS)
Scalable I/O tracing and analysis

Proceedings of the 4th Annual Workshop on Petascale Data Storage
Detection of violations to the MPI standard in hybrid OpenMP/MPI applications

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
FlowChecker: Detecting Bugs in MPI Libraries via Message Flow Checking

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
GRace: a low-overhead mechanism for detecting data races in GPU programs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Order preserving event aggregation in TBONs

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Formal analysis of MPI-based parallel programs

Communications of the ACM
Efficient data race detection for distributed memory parallel programs

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Correctness checking of MPI one-sided communication using marmot

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Deadlock-Free channels and locks

ESOP'10 Proceedings of the 19th European conference on Programming Languages and Systems
Verification and coverage of message passing multicore applications

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on verification challenges in the concurrent world
Probabilistic diagnosis of performance faults in large-scale parallel applications

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Auto-generation of communication benchmark traces

ACM SIGMETRICS Performance Evaluation Review
MPI runtime error detection with MUST: advances in deadlock detection

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Open issues in MPI implementation

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Retrospect: deterministic replay of MPI applications for interactive distributed debugging

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Practical model-checking method for verifying correctness of MPI programs

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Runtime function instrumentation with EZTrace

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Scaling data race detection for partitioned global address space programs

Proceedings of the 27th international ACM conference on International conference on supercomputing
UPC-CHECK: a scalable tool for detecting run-time errors in Unified Parallel C

Computer Science - Research and Development
Combining static and dynamic validation of MPI collective communications

Proceedings of the 20th European MPI Users' Group Meeting
Runtime MPI collective checking with tree-based overlay networks

Proceedings of the 20th European MPI Users' Group Meeting
Distributed wait state tracking for runtime MPI deadlock detection

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
MPI runtime error detection with MUST: Advances in deadlock detection

Scientific Programming - Selected Papers from Super Computing 2012

Quantified Score

Hi-index	0.02

Visualization

Abstract

As evidenced by the popularity of MPI (Message Passing Interface), message passing is an effective programming technique for managing coarse-grained concurrency on distributed computers. Unfortunately, debugging message-passing applications can be difficult. Software complexity, data races, and scheduling dependencies can make programming errors challenging to locate with manual, interactive debugging techniques. This article describes Umpire, a new tool for detecting programming errors at runtime in message passing applications. Umpire monitors the MPI operations of an application by interposing itself between the application and the MPI runtime system using the MPI profiling layer. Umpire then checks the application's MPI behavior for specific errors. Our initial collection of programming errors includes deadlock detection, mismatched collective operations, and resource exhaustion. We present an evaluation on a variety of applications that demonstrates the effectiveness of this approach.