Techniques for debugging parallel programs with flowback analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Detecting data races on weak memory systems
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Control-flow analysis of higher-order languages of taming lambda
Control-flow analysis of higher-order languages of taming lambda
What are race conditions?: Some issues and formalizations
ACM Letters on Programming Languages and Systems (LOPLAS)
Eraser: a dynamic data race detector for multithreaded programs
ACM Transactions on Computer Systems (TOCS)
Dynamic software testing of MPI applications with umpire
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A framework for reducing the cost of instrumented code
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Efficient and precise datarace detection for multithreaded object-oriented programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
A performance analysis of the Berkeley UPC compiler
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
KISS: keep it simple and sequential
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Low-overhead memory leak detection using adaptive statistical profiling
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Parameterized object sensitivity for points-to analysis for Java
ACM Transactions on Software Engineering and Methodology (TOSEM)
Automated, scalable debugging of MPI programs with Intel® Message Checker
Proceedings of the second international workshop on Software engineering for high performance computing system applications
AVIO: detecting atomicity violations via access interleaving invariants
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Accurate and efficient filtering for the Intel thread checker race detector
Proceedings of the 1st workshop on Architectural and system support for improving software dependability
CheckFence: checking consistency of concurrent data types on relaxed memory models
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
ISP: a tool for model checking MPI programs
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Velodrome: a sound and complete dynamic atomicity checker for multithreaded programs
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Scalable Dynamic Load Balancing Using UPC
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
FastTrack: efficient and precise dynamic race detection
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
LiteRace: effective sampling for lightweight data-race detection
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
CalFuzzer: An Extensible Active Testing Framework for Concurrent Programs
CAV '09 Proceedings of the 21st International Conference on Computer Aided Verification
A Scalable and Distributed Dynamic Formal Verifier for MPI Programs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Automatic formal verification of MPI-based parallel programs
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
QVM: An Efficient Runtime for Detecting Defects in Deployed Systems
ACM Transactions on Software Engineering and Methodology (TOSEM)
Efficient data race detection for distributed memory parallel programs
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Large Scale Verification of MPI Programs Using Lamport Clocks with Lazy Update
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Dynamic deadlock analysis of multi-threaded programs
HVC'05 Proceedings of the First Haifa international conference on Hardware and Software Verification and Testing
Scalable and precise dynamic datarace detection for structured parallelism
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Hi-index | 0.00 |
Contemporary and future programming languages for HPC promote hybrid parallelism and shared memory abstractions using a global address space. In this programming style, data races occur easily and are notoriously hard to find. Existing state-of-the-art data race detectors exhibit 10X-100X performance degradation and do not handle hybrid parallelism. In this paper we present the first complete implementation of data race detection at scale for UPC programs. Our implementation tracks local and global memory references in the program and it uses two techniques to reduce the overhead: 1) hierarchical function and instruction level sampling; and 2) exploiting the runtime persistence of aliasing and locality specific to Partitioned Global Address Space applications. The results indicate that both techniques are required in practice: well optimized instruction sampling introduces overheads as high as 6500% (65X slowdown), while each technique in separation is able to reduce it only to 1000% (10X slowdown). When applying the optimizations in conjunction our tool finds all previously known data races in our benchmark programs with at most 50% overhead when running on 2048 cores. Furthermore, while previous results illustrate the benefits of function level sampling, our experiences show that this technique does not work for scientific programs: instruction sampling or a hybrid approach is required.