Hardware-assisted replay of multiprocessor programs
PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
Optimal tracing and replay for debugging shared-memory parallel programs
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Generating representative Web workloads for network and server performance evaluation
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes
Proceedings of the 30th annual international symposium on Computer architecture
A "flight data recorder" for enabling full-system multiprocessor deterministic replay
Proceedings of the 30th annual international symposium on Computer architecture
Checkpointing and Its Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging
Proceedings of the 32nd annual international symposium on Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
A regulated transitive reduction (RTR) for longer memory race recording
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Recording shared memory dependencies using strata
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
BulkSC: bulk enforcement of sequential consistency
Proceedings of the 34th annual international symposium on Computer architecture
Implementing Signatures for Transactional Memory
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Rerun: Exploiting Episodes for Lightweight Memory Race Recording
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Karma: scalable deterministic record-replay
Proceedings of the international conference on Supercomputing
Deterministic Replay Using Global Clock
ACM Transactions on Architecture and Code Optimization (TACO)
Cyrus: unintrusive application-level record-replay for replay parallelism
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
An efficient deterministic record-replay with separate dependencies
Computers and Electrical Engineering
Proceedings of the 40th Annual International Symposium on Computer Architecture
Micro-architectural support for metadata coherence in multi-core dynamic information flow tracking
Proceedings of the 2nd International Workshop on Hardware and Architectural Support for Security and Privacy
RelaxReplay: record and replay for relaxed-consistency multiprocessors
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
As chip multiprocessors emerge as the prevalent microprocessor architecture, support for debugging shared-memory parallel programs becomes important. A key difficulty is the programs' nondeterministic semantics due to which replay runs of a buggy program may not reproduce the bug. The non-determinism stems from memory races where accesses from two threads, at least one of which is a write, go to the same memory location. Previous hardware schemes for memory race recording log the predecessor-successor thread ordering at memory races and enforce the same orderings in the replay run to achieve deterministic replay. To reduce the log size, the schemes exploit transitivity in the orderings to avoid recording redundant orderings. To reduce the log size further while requiring minimal hardware, we propose Timetraveler which for the first time exploits acyclicity of races based on the key observation that an acyclic race need not be recorded even if the race is not covered already by transitivity. Timetraveler employs a novel and elegant mechanism called post-dating which both ensures that acyclic races, including those through the L2, are eventually ordered correctly, and identifies cyclic races. To address false cycles through the L2, Timetraveler employs another novel mechanism called time-delay buffer which delays the advancement of the L2 banks' timestamps and thereby reduces the false cycles. Using simulations, we show that Timetraveler reduces the log size for commercial workloads by 88% over the best previous approach while using only a 696-byte time-delay buffer.