Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Parallelizing dynamic information flow tracking
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Atom-Aid: Detecting and Surviving Atomicity Violations
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
SigRace: signature-based data race detection
Proceedings of the 36th annual international symposium on Computer architecture
NUDA: a non-uniform debugging architecture and non-intrusive race detection for many-core
Proceedings of the 46th Annual Design Automation Conference
Unit testing for multi-threaded Java programs
Proceedings of the 7th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Finding concurrency bugs with context-aware communication graphs
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Butterfly analysis: adapting dataflow analysis to dynamic parallel monitoring
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Umbra: efficient and scalable memory shadowing
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Efficient memory shadowing for 64-bit architectures
Proceedings of the 2010 international symposium on Memory management
Proceedings of the 37th annual international symposium on Computer architecture
Proceedings of the 37th annual international symposium on Computer architecture
A case for system support for concurrency exceptions
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Composable specifications for structured shared-memory communication
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
RunAssert: a non-intrusive run-time assertion for parallel programs debugging
Proceedings of the Conference on Design, Automation and Test in Europe
Log-based architectures: using multicore to help software behave correctly
ACM SIGOPS Operating Systems Review
RACEZ: a lightweight and non-invasive race detection tool for production applications
Proceedings of the 33rd International Conference on Software Engineering
Data-race exceptions have benefits beyond the memory model
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Understanding bloom filter intersection for lazy address-set disambiguation
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Isolating and understanding concurrency errors using reconstructed execution fragments
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Demand-driven software race detection using hardware performance counters
Proceedings of the 38th annual international symposium on Computer architecture
Exploiting cache traffic monitoring for run-time race detection
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Accelerating data race detection with minimal hardware support
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
FlexSig: Implementing flexible hardware signatures
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Highly scalable distributed dataflow analysis
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
SuperCoP: a general, correct, and performance-efficient supervised memory system
Proceedings of the 9th conference on Computing Frontiers
RADISH: always-on sound and complete Ra Detection in Software and Hardware
Proceedings of the 39th Annual International Symposium on Computer Architecture
A survey and taxonomy of on-chip monitoring of multicore systems-on-chip
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Parallelizing data race detection
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Production-run software failure diagnosis via hardware performance counters
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Efficient concurrency-bug detection across inputs
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Low-level detection of language-level data races with LARD
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Leveraging the short-term memory of hardware to diagnose production-run software failures
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
The emergence of multicore architectures will lead to an increase in the use of multithreaded applications that are prone to synchronization bugs, such as data races. Software solutions for detecting data races generally incur large overheads. Hardware support for race detection can significantly reduce that overhead. However, all existing hardware proposals for race detection are based on the happens-before algorithm which is sensitive to thread interleaving and cannot detect races that are not exposed during the monitored run. The lockset algorithm addresses this limitation. Unfortunately, due to the challenging issues such as storing the lockset information and performing complex set operations, so far it has been implemented only in software with 10-30 times performance hit. This paper proposes the rst hardware implementation (called HARD) of the lockset algorithm to exploit the race detection capability of this algorithm with minimal over-head. HARD ef ciently stores lock sets in hardware bloom lters and converts the expensive set operations into fast bitwise logic operations with negligible overhead. We evaluate HARD using six SPLASH-2 applications with 60 randomly injected bugs. Our results show that HARD can detect 54 out of 60 tested bugs, 20% more than happens-before, with only 0.1-2.6% of execution overhead. We also show our hardware design is cost-effective by comparing with the ideal lockset implementation, which would require a large amount of hardware resources.