Fault-tolerant average execution time optimization for general-purpose multi-processor system-on-chips

Authors:
Mikael Väyrynen;Virendra Singh;Erik Larsson
Affiliations:
Linköping University, Sweden;Indian Institute of Science, India;Linköping University, Sweden
Venue:
Proceedings of the Conference on Design, Automation and Test in Europe
Year:
2009

Citing 9
Cited 3

Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Design Challenges of Technology Scaling

IEEE Micro
Fault-Tolerant Deadline-Monotonic Algorithm for Scheduling Hard-Real-Time Tasks

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A New Fault-Tolerant Technique for Improving the Schedulability in Multiprocessor Real-time Systems

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Design Optimization of Time-and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Cache size selection for performance, energy and reliability of time-constrained systems

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Combined time and information redundancy for SEU-tolerance in energy-efficient real-time systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fault-Tolerant Systems

Fault-Tolerant Systems
Architecture Design for Soft Errors

Architecture Design for Soft Errors

A self-checking hardware journal for a fault-tolerant processor architecture

International Journal of Reconfigurable Computing - Special issue on selected papers from the international workshop on reconfigurable communication-centric systems on chips (ReCoSoC' 2010)
System-level memory management based on statistical variability compensation for frame-based applications

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10
Aging-aware hardware-software task partitioning for reliable reconfigurable multiprocessor systems

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fault-tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault-tolerance. For a given job and a soft (transient) error probability, we define mathematical formulas for AET that includes bus communication overhead for both voting (active replication) and rollback-recovery with checkpointing (RRC). And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC, (2) finding the number of processors and job-to-processor assignment when using voting, and (3) defining fault-tolerance scheme (voting or RRC) per job and defining its usage for each job. Experiments demonstrate significant savings in AET.