Memory access buffering in multiprocessors
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Correct memory operation of cache-based multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
Hardware-assisted replay of multiprocessor programs
PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
Optimal tracing and replay for debugging shared-memory parallel programs
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
On testing cache-coherent shared memories
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
A "flight data recorder" for enabling full-system multiprocessor deterministic replay
Proceedings of the 30th annual international symposium on Computer architecture
ITC '99 Proceedings of the 1999 IEEE International Test Conference
ReVirt: enabling intrusion analysis through virtual-machine logging and replay
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging
Proceedings of the 32nd annual international symposium on Computer Architecture
Memory Model = Instruction Reordering + Store Atomicity
Proceedings of the 33rd annual international symposium on Computer Architecture
The good, the bad, and the ugly of silicon debug
Proceedings of the 43rd annual Design Automation Conference
A regulated transitive reduction (RTR) for longer memory race recording
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Recording shared memory dependencies using strata
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
An embedded multi-resolution AMBA trace analyzer for microprocessor-based SoC integration
Proceedings of the 44th annual Design Automation Conference
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
First silicon functional validation and debug of multicore microprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Rerun: Exploiting Episodes for Lightweight Memory Race Recording
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
An embedded infrastructure of debug and trace interface for the DSP platform
Proceedings of the 45th annual Design Automation Conference
You can catch more bugs with transaction level honey
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
MPIWiz: subgroup reproducible replay of mpi applications
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Capo: a software-hardware interface for practical deterministic multiprocessor replay
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
DMP: deterministic shared memory multiprocessing
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
ODR: output-deterministic replay for multicore debugging
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
R2: an application-level kernel for record and replay
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Fast and generalized polynomial time memory consistency verification
CAV'06 Proceedings of the 18th international conference on Computer Aided Verification
ORDER: object centric deterministic replay for Java
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
CoreRacer: a practical memory race recorder for multicore x86 TSO processors
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Deterministic Replay Using Global Clock
ACM Transactions on Architecture and Code Optimization (TACO)
Cyrus: unintrusive application-level record-replay for replay parallelism
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Proceedings of the 40th Annual International Symposium on Computer Architecture
Micro-architectural support for metadata coherence in multi-core dynamic information flow tracking
Proceedings of the 2nd International Workshop on Hardware and Architectural Support for Security and Privacy
RelaxReplay: record and replay for relaxed-consistency multiprocessors
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Debugging parallel program is a well-known difficult problem. A promising method to facilitate debugging parallel program is using hardware support to achieve deterministic replay. A hardware-assisted deterministic replay scheme should have a small log size, as well as low design cost, to be feasible for adopting by industrial processors. To achieve the goals, we propose a novel and succinct hardware-assisted deterministic replay scheme named LReplay. The key innovation of LReplay is that instead of recording the logical time orders between instructions or instruction blocks as previous investigations, LReplay is built upon recording the pending period information [6]. According to the experimental results on Godson-3, the overall log size of LReplay is about 0.55B/K-Inst (byte per k-instruction) for sequential consistency, and 0.85B/K-Inst for Godson-3 consistency. The log size is smaller in an order of magnitude than state-of-art deterministic replay schemes incuring no performance loss. Furthermore, LReplay only consumes about $1.3%$ area of Godson-3, since it requires only trivial modifications to the existing components of Godson-3. The above features of LReplay demonstrate the potential of integrating hardware-assisted deterministic replay into future industrial processors.