Dynamic software testing of MPI applications with umpire
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A Preliminary Topological Debugger for MPI Programs
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Improving distributed memory applications testing by message perturbation
Proceedings of the 2006 workshop on Parallel and distributed systems: testing and debugging
Techniques for specifying bug patterns
Proceedings of the 2007 ACM workshop on Parallel and distributed systems: testing and debugging
Semantics driven dynamic partial-order reduction of MPI-based parallel programs
Proceedings of the 2007 ACM workshop on Parallel and distributed systems: testing and debugging
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
D3S: debugging deployed distributed systems
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
CAV '08 Proceedings of the 20th international conference on Computer Aided Verification
MPIWiz: subgroup reproducible replay of mpi applications
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
MPI correctness checking for OpenMP/MPI applications
International Journal of Parallel Programming
Detection of violations to the MPI standard in hybrid OpenMP/MPI applications
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Falcon: fault localization in concurrent programs
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
A Scalable and Distributed Dynamic Formal Verifier for MPI Programs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
FlowChecker: Detecting Bugs in MPI Libraries via Message Flow Checking
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Precise dynamic analysis for slack elasticity: adding buffering without adding bugs
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
GRace: a low-overhead mechanism for detecting data races in GPU programs
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Efficient data race detection for distributed memory parallel programs
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Correctness checking of MPI one-sided communication using marmot
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Verification and coverage of message passing multicore applications
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on verification challenges in the concurrent world
Practical model-checking method for verifying correctness of MPI programs
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Scaling data race detection for partitioned global address space programs
Proceedings of the 27th international ACM conference on International conference on supercomputing
UPC-CHECK: a scalable tool for detecting run-time errors in Unified Parallel C
Computer Science - Research and Development
Combining static and dynamic validation of MPI collective communications
Proceedings of the 20th European MPI Users' Group Meeting
Hi-index | 0.00 |
The trend towards many-core multi-processor systems and clusters will make systems with tens and hundreds of processors more widely available. Current manual debugging techniques do not scale well to such large systems. Advanced automated debugging tools are needed for standard programming models based on commodity computing, such as threads and MPI. We surveyed MPI users to identify the kinds of MPI errors that they encounter, and classify the errors into several types. We describe how automated tools can detect such errors and present the Intel® Message Checker (IMC) technology being developed at the Intel Advanced Computing Center. IMC's unique technology automatically detects several kinds of MPI errors such as various types of mismatches, race conditions, deadlocks and potential deadlocks, and resource misuse. Finally, we review the usability and uniqueness of IMC and discuss our future plans.