Adaptive optimal checkpoint interval and its impact on system's overall quality in soft real-time applications

Authors:
Nianen Chen;Shangping Ren
Affiliations:
Illinois Institute of Technology, Chicago, IL;Illinois Institute of Technology, Chicago, IL
Venue:
Proceedings of the 2009 ACM symposium on Applied Computing
Year:
2009

Citing 10
Cited 1

Minimizing Aperiodic Response Times in a Firm Real-Time Environment

IEEE Transactions on Software Engineering
On the Optimum Checkpoint Interval

Journal of the ACM (JACM)
Stochastic Models for Performance Analysis of Database Recovery Control

IEEE Transactions on Computers
Deadline Assignment in a Distributed Soft Real-Time System

IEEE Transactions on Parallel and Distributed Systems
Worst Case Timing Requirement of Real-Time Tasks with Time Redundancy

RTCSA '99 Proceedings of the Sixth International Conference on Real-Time Computing Systems and Applications
WCET Analysis of Probabilistic Hard Real-Time Systems

RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
Availability Models with Age-Dependent Checkpointing

SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
Real Time Distributed Control Systems Using RTAI

ISORC '03 Proceedings of the Sixth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing
Energy-aware deterministic fault tolerance in distributed real-time embedded systems

Proceedings of the 41st annual Design Automation Conference
Reliability-Aware Dynamic Energy Management in Dependable Embedded Real-Time Systems

RTAS '06 Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium

Cloud Computing Towards Technological Convergence

International Journal of Cloud Applications and Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Soft real-time systems often have to consider both timing and probabilistic fault-tolerance requirements. When checkpointing techniques are used for fault tolerance purposes, the checkpointing frequency unyieldingly affects the system's overall quality measured by an integrated value of system QoS properties, such as availability, task execution time, and task deadline miss probability. In this paper, we first formally analyze the relationships between checkpoint interval and system availability, task execution time, and task deadline miss probability, respectively by considering a Poisson probabilistic fault model. We further define the system's overall quality as a weighted sum of these three QoS measures, from which an optimization problem is formulated to decide the checkpoint interval that maximizes system's overall quality. Also presented in the paper are a prototype implementation of a framework that allows adaptive checkpointing and a set of experiments executed upon the framework that further validate our analytical results.