Scaling data race detection for partitioned global address space programs

Authors:
Chang Seo Park;Koushik Sen;Costin Iancu
Affiliations:
UC Berkeley, Berkeley, CA, USA;UC Berkeley, Berkeley, CA, USA;LBNL, Berkeley, CA, USA
Venue:
Proceedings of the 27th international ACM conference on International conference on supercomputing
Year:
2013

Citing 30
Cited 0

Techniques for debugging parallel programs with flowback analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
Detecting data races on weak memory systems

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Control-flow analysis of higher-order languages of taming lambda

Control-flow analysis of higher-order languages of taming lambda
What are race conditions?: Some issues and formalizations

ACM Letters on Programming Languages and Systems (LOPLAS)
Eraser: a dynamic data race detector for multithreaded programs

ACM Transactions on Computer Systems (TOCS)
Dynamic software testing of MPI applications with umpire

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A framework for reducing the cost of instrumented code

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Efficient and precise datarace detection for multithreaded object-oriented programs

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Shared Memory Consistency Models: A Tutorial

Computer
A performance analysis of the Berkeley UPC compiler

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
KISS: keep it simple and sequential

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Low-overhead memory leak detection using adaptive statistical profiling

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Parameterized object sensitivity for points-to analysis for Java

ACM Transactions on Software Engineering and Methodology (TOSEM)
Automated, scalable debugging of MPI programs with Intel® Message Checker

Proceedings of the second international workshop on Software engineering for high performance computing system applications
AVIO: detecting atomicity violations via access interleaving invariants

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Accurate and efficient filtering for the Intel thread checker race detector

Proceedings of the 1st workshop on Architectural and system support for improving software dependability
CheckFence: checking consistency of concurrent data types on relaxed memory models

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
ISP: a tool for model checking MPI programs

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Velodrome: a sound and complete dynamic atomicity checker for multithreaded programs

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Scalable Dynamic Load Balancing Using UPC

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
FastTrack: efficient and precise dynamic race detection

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
LiteRace: effective sampling for lightweight data-race detection

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
CalFuzzer: An Extensible Active Testing Framework for Concurrent Programs

CAV '09 Proceedings of the 21st International Conference on Computer Aided Verification
A Scalable and Distributed Dynamic Formal Verifier for MPI Programs

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Automatic formal verification of MPI-based parallel programs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
QVM: An Efficient Runtime for Detecting Defects in Deployed Systems

ACM Transactions on Software Engineering and Methodology (TOSEM)
Efficient data race detection for distributed memory parallel programs

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Large Scale Verification of MPI Programs Using Lamport Clocks with Lazy Update

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Dynamic deadlock analysis of multi-threaded programs

HVC'05 Proceedings of the First Haifa international conference on Hardware and Software Verification and Testing
Scalable and precise dynamic datarace detection for structured parallelism

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Contemporary and future programming languages for HPC promote hybrid parallelism and shared memory abstractions using a global address space. In this programming style, data races occur easily and are notoriously hard to find. Existing state-of-the-art data race detectors exhibit 10X-100X performance degradation and do not handle hybrid parallelism. In this paper we present the first complete implementation of data race detection at scale for UPC programs. Our implementation tracks local and global memory references in the program and it uses two techniques to reduce the overhead: 1) hierarchical function and instruction level sampling; and 2) exploiting the runtime persistence of aliasing and locality specific to Partitioned Global Address Space applications. The results indicate that both techniques are required in practice: well optimized instruction sampling introduces overheads as high as 6500% (65X slowdown), while each technique in separation is able to reduce it only to 1000% (10X slowdown). When applying the optimizations in conjunction our tool finds all previously known data races in our benchmark programs with at most 50% overhead when running on 2048 cores. Furthermore, while previous results illustrate the benefits of function level sampling, our experiences show that this technique does not work for scientific programs: instruction sampling or a hybrid approach is required.