Light64: lightweight hardware support for data race detection during systematic testing of parallel programs

Authors:
Adrian Nistor;Darko Marinov;Josep Torrellas
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2009

Citing 27
Cited 7

Efficient and correct execution of parallel programs that share memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
An efficient cache-based access anomaly detection scheme

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
On-the-fly detection of data races for programs with nested fork-join parallelism

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Model checking for programming languages using VeriSoft

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Eraser: a dynamic data race detector for multi-threaded programs

Proceedings of the sixteenth ACM symposium on Operating systems principles
Experience with processes and monitors in Mesa

Communications of the ACM
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Priority Inheritance Protocols: An Approach to Real-Time Synchronization

IEEE Transactions on Computers
Model Checking Programs

Automated Software Engineering
Model-Checking Multi-threaded Distributed Java Programs

Proceedings of the 7th International SPIN Workshop on SPIN Model Checking and Software Verification
Hybrid dynamic data race detection

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes

Proceedings of the 30th annual international symposium on Computer architecture
CMC: a pragmatic approach to model checking real code

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Applications of synchronization coverage

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Accurate and efficient filtering for the Intel thread checker race detector

Proceedings of the 1st workshop on Architectural and system support for improving software dependability
Automatically classifying benign and harmful data races using replay analysis

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
HARD: Hardware-Assisted Lockset-based Race Detection

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Race directed random testing of concurrent programs

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Tackling Large Verification Problems with the Swarm Tool

SPIN '08 Proceedings of the 15th international workshop on Model Checking Software
CTrigger: exposing atomicity violation bugs from their hiding places

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
LiteRace: effective sampling for lightweight data-race detection

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
SigRace: signature-based data race detection

Proceedings of the 36th annual international symposium on Computer architecture
MODIST: transparent model checking of unmodified distributed systems

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
CrystalBall: predicting and preventing inconsistencies in deployed distributed systems

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Finding and reproducing Heisenbugs in concurrent programs

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Bounded model checking of concurrent programs

CAV'05 Proceedings of the 17th international conference on Computer Aided Verification

InstantCheck: Checking the Determinism of Parallel Programs Using On-the-Fly Incremental Hashing

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting cache traffic monitoring for run-time race detection

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
OpenMPspy: Leveraging quality assurance for parallel software

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Data races vs. data race bugs: telling the difference with portend

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
ATDetector: improving the accuracy of a commercial data race detector by identifying address transfer

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Static detection of resource contention problems in server-side scripts

Proceedings of the 34th International Conference on Software Engineering
Parallelizing data race detection

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Developing and testing parallel code is hard. Even for one given input, a parallel program can have many possible different thread interleavings, which are hard for the programmer to foresee and for a testing tool to cover using stress or random testing. For this reason, a recent trend is to use Systematic Testing, which methodically explores different thread interleavings, while checking for various bugs. Data races are common bugs but, unfortunately, checking for races is often skipped in systematic testers because it introduces substantial runtime overhead if done purely in software. Recently, several techniques for race detection in hardware have been proposed, but they still require significant hardware support. This paper presents Light64, a novel technique for data race detection during systematic testing that has both small runtime overhead and very lightweight hardware requirements. Light64 is based on the observation that two thread interleavings in which racing accesses are flipped will very likely exhibit some deviation in their program execution history. Light64 computes a 64-bit hash of the program execution history during systematic testing. If the hashes of two interleavings with the same happens-before graph differ, then a race has occurred. Light64 only needs a 64-bit register per core, a drastic improvement over previous hardware schemes. In addition, our experiments on SPLASH-2 applications show that Light64 has no false positives, detects 96% of races, and induces only a small slowdown for race-free executions --- on average, 1% and 37% in two different modes.