Minimizing Aperiodic Response Times in a Firm Real-Time Environment
IEEE Transactions on Software Engineering
On the Optimum Checkpoint Interval
Journal of the ACM (JACM)
Stochastic Models for Performance Analysis of Database Recovery Control
IEEE Transactions on Computers
Deadline Assignment in a Distributed Soft Real-Time System
IEEE Transactions on Parallel and Distributed Systems
Worst Case Timing Requirement of Real-Time Tasks with Time Redundancy
RTCSA '99 Proceedings of the Sixth International Conference on Real-Time Computing Systems and Applications
WCET Analysis of Probabilistic Hard Real-Time Systems
RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
Availability Models with Age-Dependent Checkpointing
SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
Real Time Distributed Control Systems Using RTAI
ISORC '03 Proceedings of the Sixth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing
Energy-aware deterministic fault tolerance in distributed real-time embedded systems
Proceedings of the 41st annual Design Automation Conference
Reliability-Aware Dynamic Energy Management in Dependable Embedded Real-Time Systems
RTAS '06 Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium
Cloud Computing Towards Technological Convergence
International Journal of Cloud Applications and Computing
Hi-index | 0.00 |
Soft real-time systems often have to consider both timing and probabilistic fault-tolerance requirements. When checkpointing techniques are used for fault tolerance purposes, the checkpointing frequency unyieldingly affects the system's overall quality measured by an integrated value of system QoS properties, such as availability, task execution time, and task deadline miss probability. In this paper, we first formally analyze the relationships between checkpoint interval and system availability, task execution time, and task deadline miss probability, respectively by considering a Poisson probabilistic fault model. We further define the system's overall quality as a weighted sum of these three QoS measures, from which an optimization problem is formulated to decide the checkpoint interval that maximizes system's overall quality. Also presented in the paper are a prototype implementation of a framework that allows adaptive checkpointing and a set of experiments executed upon the framework that further validate our analytical results.