Timetraveler: exploiting acyclic races for optimizing memory race recording

Authors:
Gwendolyn Voskuilen;Faraz Ahmad;T. N. Vijaykumar
Affiliations:
Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA
Venue:
Proceedings of the 37th annual international symposium on Computer architecture
Year:
2010

Citing 23
Cited 8

Hardware-assisted replay of multiprocessor programs

PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
Optimal tracing and replay for debugging shared-memory parallel programs

PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Generating representative Web workloads for network and server performance evaluation

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Simics: A Full System Simulation Platform

Computer
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes

Proceedings of the 30th annual international symposium on Computer architecture
A "flight data recorder" for enabling full-system multiprocessor deterministic replay

Proceedings of the 30th annual international symposium on Computer architecture
Checkpointing and Its Applications

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging

Proceedings of the 32nd annual international symposium on Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
A regulated transitive reduction (RTR) for longer memory race recording

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Recording shared memory dependencies using strata

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
Implementing Signatures for Transactional Memory

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Rerun: Exploiting Episodes for Lightweight Memory Race Recording

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture

Karma: scalable deterministic record-replay

Proceedings of the international conference on Supercomputing
Deterministic Replay Using Global Clock

ACM Transactions on Architecture and Code Optimization (TACO)
Cyrus: unintrusive application-level record-replay for replay parallelism

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
An efficient deterministic record-replay with separate dependencies

Computers and Electrical Engineering
QuickRec: prototyping an intel architecture extension for record and replay of multithreaded programs

Proceedings of the 40th Annual International Symposium on Computer Architecture
Micro-architectural support for metadata coherence in multi-core dynamic information flow tracking

Proceedings of the 2nd International Workshop on Hardware and Architectural Support for Security and Privacy
RelaxReplay: record and replay for relaxed-consistency multiprocessors

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

As chip multiprocessors emerge as the prevalent microprocessor architecture, support for debugging shared-memory parallel programs becomes important. A key difficulty is the programs' nondeterministic semantics due to which replay runs of a buggy program may not reproduce the bug. The non-determinism stems from memory races where accesses from two threads, at least one of which is a write, go to the same memory location. Previous hardware schemes for memory race recording log the predecessor-successor thread ordering at memory races and enforce the same orderings in the replay run to achieve deterministic replay. To reduce the log size, the schemes exploit transitivity in the orderings to avoid recording redundant orderings. To reduce the log size further while requiring minimal hardware, we propose Timetraveler which for the first time exploits acyclicity of races based on the key observation that an acyclic race need not be recorded even if the race is not covered already by transitivity. Timetraveler employs a novel and elegant mechanism called post-dating which both ensures that acyclic races, including those through the L2, are eventually ordered correctly, and identifies cyclic races. To address false cycles through the L2, Timetraveler employs another novel mechanism called time-delay buffer which delays the advancement of the L2 banks' timestamps and thereby reduces the false cycles. Using simulations, we show that Timetraveler reduces the log size for commercial workloads by 88% over the best previous approach while using only a 696-byte time-delay buffer.