Hardware instruction counting for log-based rollback recovery on x86-family processors

Authors:
Daniel Stodden;Hubert Eichner;Max Walter;Carsten Trinitis
Affiliations:
Technische Universität München;Technische Universität München;Technische Universität München;Technische Universität München
Venue:
ISAS'06 Proceedings of the Third international conference on Service Availability
Year:
2006

Citing 8
Cited 1

Cheap hardware support for software debugging and profiling

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
A software instruction counter

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Hypervisor-based fault tolerance

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Support for Software Interrupts in Log-Based Rollback-Recovery

IEEE Transactions on Computers
A survey of rollback-recovery protocols in message-passing systems

ACM Computing Surveys (CSUR)
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Virtual Machines: Versatile Platforms for Systems and Processes (The Morgan Kaufmann Series in Computer Architecture and Design)

Virtual Machines: Versatile Platforms for Systems and Processes (The Morgan Kaufmann Series in Computer Architecture and Design)

Transparent, lightweight application execution replay on commodity multiprocessor operating systems

Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Log-based recovery protocols enable process replicas in distributed systems to replay a computation up to the point where a previous computation failed. One fundamental assumption underlying these protocols is the piecewise deterministic (PWD) execution model, stating that recovery must not execute, but simulate the execution of nondeterministic events in order to maintain consistency. One such source of nondeterminism are asynchronous events triggering software signal handlers, an issue known to be solved by instruction counters. Efficient implementations in software have been shown to be practical, but require significant changes to applications and system software. Hardware counters, in contrast, allow running software unmodified. A number of processors implementing the Intel x86 instruction set architecture provide monitoring registers with properties similar to a true instruction counter. Designed for application profiling, these facilities reveal a number issues to be resolved when utilized for applications like the PWD model, which demands for a maximum in precision during replay. We discuss some of the most prominent problems faced when using performance counters for protocols satisfying the PWD model. We present additional hardware mechanisms, eliminating inconsistencies in counter interrupt delivery, based on standard processor debugging facilities, and at the expense of a small number of additionally generated exceptions.