An empirical comparison of monitoring algorithms for access anomaly detection
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Improving the accuracy of data race detection
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Online data-race detection via coherency guarantees
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Eraser: a dynamic data race detector for multithreaded programs
ACM Transactions on Computer Systems (TOCS)
Type-based race detection for Java
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Dynamic software testing of MPI applications with umpire
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Efficient and precise datarace detection for multithreaded object-oriented programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Ownership types for safe programming: preventing data races and deadlocks
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Hybrid dynamic data race detection
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient on-the-fly data race detection in multithreaded C++ programs
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
RacerX: effective, static detection of race conditions and deadlocks
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Race checking by context inference
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Extending a traditional debugger to debug massively parallel applications
Journal of Parallel and Distributed Computing
Trust but verify: monitoring remotely executing programs for progress and correctness
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
RaceTrack: efficient detection of data race conditions via adaptive tracking
Proceedings of the twentieth ACM symposium on Operating systems principles
Using Dynamic Tracing Sampling to Measure Long Running Programs
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Automated, scalable debugging of MPI programs with Intel® Message Checker
Proceedings of the second international workshop on Software engineering for high performance computing system applications
A Portable Method for Finding User Errors in the Usage of MPI Collective Operations
International Journal of High Performance Computing Applications
Learning from mistakes: a comprehensive study on real world concurrency bug characteristics
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A graph based approach for MPI deadlock detection
Proceedings of the 23rd international conference on Supercomputing
A translation system for enabling data mining applications on GPUs
Proceedings of the 23rd international conference on Supercomputing
A framework for efficient and scalable execution of domain-specific templates on GPUs
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A Tool for Detecting First Races in OpenMP Programs
PaCT '09 Proceedings of the 10th International Conference on Parallel Computing Technologies
Managing contention for shared resources on multicore processors
Communications of the ACM
Scalable temporal order analysis for large scale debugging
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the 24th ACM International Conference on Supercomputing
Proceedings of the 24th ACM International Conference on Supercomputing
Scalable SMT-based verification of GPU kernel functions
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
GPU-based NFA implementation for memory efficient high speed regular expression matching
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
GKLEE: concolic verification and test generation for GPUs
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Parametric flows: automated behavior equivalencing for symbolic analysis of races in CUDA programs
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Symbolic testing of OpenCL code
HVC'11 Proceedings of the 7th international Haifa Verification conference on Hardware and Software: verification and testing
Hi-index | 0.00 |
In recent years, GPUs have emerged as an extremely cost-effective means for achieving high performance. Many application developers, including those with no prior parallel programming experience, are now trying to scale their applications using GPUs. While languages like CUDA and OpenCL have eased GPU programming for non-graphical applications, they are still explicitly parallel languages. All parallel programmers, particularly the novices, need tools that can help ensuring the correctness of their programs. Like any multithreaded environment, data races on GPUs can severely affect the program reliability. Thus, tool support for detecting race conditions can significantly benefit GPU application developers. Existing approaches for detecting data races on CPUs or GPUs have one or more of the following limitations: 1) being illsuited for handling non-lock synchronization primitives on GPUs; 2) lacking of scalability due to the state explosion problem; 3) reporting many false positives because of simplified modeling; and/or 4) incurring prohibitive runtime and space overhead. In this paper, we propose GRace, a new mechanism for detecting races in GPU programs that combines static analysis with a carefully designed dynamic checker for logging and analyzing information at runtime. Our design utilizes GPUs memory hierarchy to log runtime data accesses efficiently. To improve the performance, GRace leverages static analysis to reduce the number of statements that need to be instrumented. Additionally, by exploiting the knowledge of thread scheduling and the execution model in the underlying GPUs, GRace can accurately detect data races with no false positives reported. Based on the above idea, we have built a prototype of GRace with two schemes, i.e., GRace-stmt and GRace-addr, for NVIDIA GPUs. Both schemes are integrated with the same static analysis. We have evaluated GRace-stmt and GRace-addr with three data race bugs in three GPU kernel functions and also have compared them with the existing approach, referred to as B-tool. Our experimental results show that both schemes of GRace are effective in detecting all evaluated cases with no false positives, whereas Btool reports many false positives for one evaluated case. On the one hand, GRace-addr incurs low runtime overhead, i.e., 22-116%, and low space overhead, i.e., 9-18MB, for the evaluated kernels. On the other hand, GRace-stmt offers more help in diagnosing data races with larger overhead.