Cheap hardware support for software debugging and profiling
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
A software instruction counter
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Hypervisor-based fault tolerance
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Support for Software Interrupts in Log-Based Rollback-Recovery
IEEE Transactions on Computers
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Virtual Machines: Versatile Platforms for Systems and Processes (The Morgan Kaufmann Series in Computer Architecture and Design)
Transparent, lightweight application execution replay on commodity multiprocessor operating systems
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Hi-index | 0.00 |
Log-based recovery protocols enable process replicas in distributed systems to replay a computation up to the point where a previous computation failed. One fundamental assumption underlying these protocols is the piecewise deterministic (PWD) execution model, stating that recovery must not execute, but simulate the execution of nondeterministic events in order to maintain consistency. One such source of nondeterminism are asynchronous events triggering software signal handlers, an issue known to be solved by instruction counters. Efficient implementations in software have been shown to be practical, but require significant changes to applications and system software. Hardware counters, in contrast, allow running software unmodified. A number of processors implementing the Intel x86 instruction set architecture provide monitoring registers with properties similar to a true instruction counter. Designed for application profiling, these facilities reveal a number issues to be resolved when utilized for applications like the PWD model, which demands for a maximum in precision during replay. We discuss some of the most prominent problems faced when using performance counters for protocols satisfying the PWD model. We present additional hardware mechanisms, eliminating inconsistencies in counter interrupt delivery, based on standard processor debugging facilities, and at the expense of a small number of additionally generated exceptions.