NAS parallel benchmark results
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
LCLint: a tool for using specifications to check code
SIGSOFT '94 Proceedings of the 2nd ACM SIGSOFT symposium on Foundations of software engineering
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Eraser: a dynamic data race detector for multithreaded programs
ACM Transactions on Computer Systems (TOCS)
Advanced compiler design and implementation
Advanced compiler design and implementation
Dynamically discovering likely program invariants to support program evolution
Proceedings of the 21st international conference on Software engineering
Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection
IEEE Transactions on Parallel and Distributed Systems
Dynamic software testing of MPI applications with umpire
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Bugs as deviant behavior: a general approach to inferring errors in systems code
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
The SLAM project: debugging system software via static analysis
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Tracking down software bugs using automatic anomaly detection
Proceedings of the 24th International Conference on Software Engineering
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
On the Use of Formal Techniques for Validation
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Extending a traditional debugger to debug massively parallel applications
Journal of Parallel and Distributed Computing
AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automated, scalable debugging of MPI programs with Intel® Message Checker
Proceedings of the second international workshop on Software engineering for high performance computing system applications
Problem diagnosis in large-scale computing environments
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Automated known problem diagnosis with event traces
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Valgrind: a framework for heavyweight dynamic binary instrumentation
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Using model checking to find serious file system errors
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A Portable Method for Finding User Errors in the Usage of MPI Collective Operations
International Journal of High Performance Computing Applications
Better bug reporting with better privacy
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Combining symbolic execution with model checking to verify parallel numerical programs
ACM Transactions on Software Engineering and Methodology (TOSEM)
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
IEEE Micro
Software Faults Diagnosis in Complex OTS Based Safety Critical Systems
EDCC-7 '08 Proceedings of the 2008 Seventh European Dependable Computing Conference
Formal verification of practical MPI programs
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A graph based approach for MPI deadlock detection
Proceedings of the 23rd international conference on Supercomputing
Assertion-Driven Development: Assessing the Quality of Contracts Using Meta-Mutations
ICSTW '09 Proceedings of the IEEE International Conference on Software Testing, Verification, and Validation Workshops
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing
Journal of Parallel and Distributed Computing
Enhancing source-level programming tools with an awareness of transparent program transformations
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Scalable temporal order analysis for large scale debugging
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Toward Automated Anomaly Identification in Large-Scale Systems
IEEE Transactions on Parallel and Distributed Systems
Vrisha: using scaling properties of parallel programs for bug detection and localization
Proceedings of the 20th international symposium on High performance distributed computing
Large scale debugging of parallel tasks with AutomaDeD
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Probabilistic diagnosis of performance faults in large-scale parallel applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
Many MPI libraries have suffered from software bugs, which severely impact the productivity of a large number of users. This paper presents a new method called FlowChecker for detecting communication-related bugs inMPI libraries. The main idea is to extract program intentions of message passing (MPintentions), and to check whether theseMP-intentions are fulfilled correctly by the underlying MPI libraries, i.e., whether messages are delivered correctly from specified sources to specified destinations. If not, FlowChecker reports the bugs and provides diagnostic information. We have built a FlowChecker prototype on Linux and evaluated it with five real-world bug cases in three widely-used MPI libraries, including Open MPI, MPICH2, and MVAPICH2. Our experimental results show that FlowChecker effectively detects all five evaluated bug cases and provides useful diagnostic information. Additionally, our experiments with HPL and NPB show that FlowChecker incurs low runtime overhead (0.9-9.7% on three MPI libraries).