Karma: scalable deterministic record-replay

Authors:
Arkaprava Basu;Jayaram Bobba;Mark D. Hill
Affiliations:
University of Wisconsin-Madison, Madison, WI, USA;Intel Corp., Hilsboro, OR, USA;University of Wisconsin-Madison, Madison, WI, USA
Venue:
Proceedings of the international conference on Supercomputing
Year:
2011

Citing 37
Cited 5

A class of compatible cache consistency protocols and their support by the IEEE futurebus

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Debugging Parallel Programs with Instant Replay

IEEE Transactions on Computers
LimitLESS directories: A scalable cache coherence scheme

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Hardware-assisted replay of multiprocessor programs

PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
The SPARC architecture manual (version 9)

The SPARC architecture manual (version 9)
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Transaction Processing: Concepts and Techniques

Transaction Processing: Concepts and Techniques
Simics: A Full System Simulation Platform

Computer
The Stanford Hydra CMP

IEEE Micro
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes

Proceedings of the 30th annual international symposium on Computer architecture
A "flight data recorder" for enabling full-system multiprocessor deterministic replay

Proceedings of the 30th annual international symposium on Computer architecture
TSOtool: A Program for Verifying Memory Systems Using the Memory Consistency Model

Proceedings of the 31st annual international symposium on Computer architecture
ReVirt: enabling intrusion analysis through virtual-machine logging and replay

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging

Proceedings of the 32nd annual international symposium on Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
ExtraVirt: detecting and recovering from transient processor faults

Proceedings of the twentieth ACM symposium on Operating systems principles
Bulk Disambiguation of Speculative Threads in Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
A regulated transitive reduction (RTR) for longer memory race recording

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Recording shared memory dependencies using strata

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
Execution replay of multiprocessor virtual machines

Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Rerun: Exploiting Episodes for Lightweight Memory Race Recording

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Capo: a software-hardware interface for practical deterministic multiprocessor replay

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
DMP: deterministic shared memory multiprocessing

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Kendo: efficient deterministic multithreading in software

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Dependence-aware transactional memory for increased concurrency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A case for an interleaving constrained shared-memory multi-processor

Proceedings of the 36th annual international symposium on Computer architecture
PRES: probabilistic replay with execution sketching on multiprocessors

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
ODR: output-deterministic replay for multicore debugging

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Architecting a chunk-based memory race recorder in modern CMPs

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
CoreDet: a compiler and runtime system for deterministic multithreaded execution

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Respec: efficient online multiprocessor replayvia speculation and external determinism

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Timetraveler: exploiting acyclic races for optimizing memory race recording

Proceedings of the 37th annual international symposium on Computer architecture

CoreRacer: a practical memory race recorder for multicore x86 TSO processors

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A survey and taxonomy of on-chip monitoring of multicore systems-on-chip

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Cyrus: unintrusive application-level record-replay for replay parallelism

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
QuickRec: prototyping an intel architecture extension for record and replay of multithreaded programs

Proceedings of the 40th Annual International Symposium on Computer Architecture
RelaxReplay: record and replay for relaxed-consistency multiprocessors

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent research in deterministic record-replay seeks to ease debugging, security, and fault tolerance on otherwise nondeterministic multicore systems. The important challenge of handling shared memory races (that can occur on any memory reference) can be made more efficient with hardware support. Recent proposals record how long threads run in isolation on top of snooping coherence (IMRR), implicit transactions (DeLorean), or directory coherence (Rerun). As core counts scale, Rerun's directory-based parallel record gets more attractive, but its nearly sequential replay becomes unacceptably slow. This paper proposes Karma for both scalable recording and replay. Karma builds an episodic memory race recorder using a conventional directory cache coherence protocol and records the order of the episodes as a directed acyclic graph. Karma also enables extension of episodes even after some conflicts. During replay, Karma uses wakeup messages to trigger a partially ordered parallel episode replay. Results with several commercial workloads on a 16-core system show that Karma can achieve replay speed (a) within 19%-28% of native execution speed without record-replay and (b) four times faster than even an idealized Rerun replay. Additional results explore tradeoffs between log size and replay speed.