High-level debugging in parasight
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Models for monitoring and debugging tools for parallel and distributed software
Journal of Parallel and Distributed Computing - Special issue: software tools for parallel programming and visualization
Parallel program debugging with on-the-fly anomaly detection
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Detecting access anomalies in programs with critical sections
PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
Performance debugging using parallel performance predicates
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Compile-time support for efficient data race detection in shared-memory parallel programs
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Optimal tracing and replay for debugging shared-memory parallel programs
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
The Ariadne debugger: scalable application of event-based abstraction
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Debugging heterogeneous distributed systems using event-based models of behavior
ACM Transactions on Computer Systems (TOCS)
Eraser: a dynamic data race detector for multithreaded programs
ACM Transactions on Computer Systems (TOCS)
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
MPI: The Complete Reference
Determining Possible Event Orders by Analyzing Sequential Traces
IEEE Transactions on Parallel and Distributed Systems
ATOM: a flexible interface for building high performance program analysis tools
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Monitoring and Debugging Parallel Software with BCS-MPI on Large-Scale Clusters
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18 - Volume 19
Productivity in High Performance Computing
International Journal of High Performance Computing Applications
Modeling wildcard-free MPI programs for verification
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Automated, scalable debugging of MPI programs with Intel® Message Checker
Proceedings of the second international workshop on Software engineering for high performance computing system applications
Improving distributed memory applications testing by message perturbation
Proceedings of the 2006 workshop on Parallel and distributed systems: testing and debugging
Concurrent deadlock detection in parallel programs
International Journal of Computers and Applications
Techniques for specifying bug patterns
Proceedings of the 2007 ACM workshop on Parallel and distributed systems: testing and debugging
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
PNMPI tools: a whole lot greater than the sum of their parts
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Dreadlocks: efficient deadlock detection
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
CAV '08 Proceedings of the 20th international conference on Computer Aided Verification
Implementing Efficient Dynamic Formal Verification Methods for MPI Programs
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A Formal Approach to Detect Functionally Irrelevant Barriers in MPI Programs
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
MPIWiz: subgroup reproducible replay of mpi applications
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Formal verification of practical MPI programs
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A graph based approach for MPI deadlock detection
Proceedings of the 23rd international conference on Supercomputing
Tools for scalable parallel program analysis: Vampir NG, MARMOT, and DeWiz
International Journal of Computational Science and Engineering
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
MPI correctness checking for OpenMP/MPI applications
International Journal of Parallel Programming
Gauss: A Framework for Verifying Scientific Computing Software
Electronic Notes in Theoretical Computer Science (ENTCS)
Scalable I/O tracing and analysis
Proceedings of the 4th Annual Workshop on Petascale Data Storage
Detection of violations to the MPI standard in hybrid OpenMP/MPI applications
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
FlowChecker: Detecting Bugs in MPI Libraries via Message Flow Checking
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
GRace: a low-overhead mechanism for detecting data races in GPU programs
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Order preserving event aggregation in TBONs
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Formal analysis of MPI-based parallel programs
Communications of the ACM
Efficient data race detection for distributed memory parallel programs
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Correctness checking of MPI one-sided communication using marmot
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Deadlock-Free channels and locks
ESOP'10 Proceedings of the 19th European conference on Programming Languages and Systems
Verification and coverage of message passing multicore applications
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on verification challenges in the concurrent world
Probabilistic diagnosis of performance faults in large-scale parallel applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Auto-generation of communication benchmark traces
ACM SIGMETRICS Performance Evaluation Review
MPI runtime error detection with MUST: advances in deadlock detection
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Open issues in MPI implementation
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Retrospect: deterministic replay of MPI applications for interactive distributed debugging
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Practical model-checking method for verifying correctness of MPI programs
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Runtime function instrumentation with EZTrace
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Scaling data race detection for partitioned global address space programs
Proceedings of the 27th international ACM conference on International conference on supercomputing
UPC-CHECK: a scalable tool for detecting run-time errors in Unified Parallel C
Computer Science - Research and Development
Combining static and dynamic validation of MPI collective communications
Proceedings of the 20th European MPI Users' Group Meeting
Runtime MPI collective checking with tree-based overlay networks
Proceedings of the 20th European MPI Users' Group Meeting
Distributed wait state tracking for runtime MPI deadlock detection
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
MPI runtime error detection with MUST: Advances in deadlock detection
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.02 |
As evidenced by the popularity of MPI (Message Passing Interface), message passing is an effective programming technique for managing coarse-grained concurrency on distributed computers. Unfortunately, debugging message-passing applications can be difficult. Software complexity, data races, and scheduling dependencies can make programming errors challenging to locate with manual, interactive debugging techniques. This article describes Umpire, a new tool for detecting programming errors at runtime in message passing applications. Umpire monitors the MPI operations of an application by interposing itself between the application and the MPI runtime system using the MPI profiling layer. Umpire then checks the application's MPI behavior for specific errors. Our initial collection of programming errors includes deadlock detection, mismatched collective operations, and resource exhaustion. We present an evaluation on a variety of applications that demonstrates the effectiveness of this approach.