Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Design Challenges of Technology Scaling
IEEE Micro
Fault-Tolerant Deadline-Monotonic Algorithm for Scheduling Hard-Real-Time Tasks
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A New Fault-Tolerant Technique for Improving the Schedulability in Multiprocessor Real-time Systems
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Design Optimization of Time-and Cost-Constrained Fault-Tolerant Distributed Embedded Systems
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Cache size selection for performance, energy and reliability of time-constrained systems
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Combined time and information redundancy for SEU-tolerance in energy-efficient real-time systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fault-Tolerant Systems
Architecture Design for Soft Errors
Architecture Design for Soft Errors
A self-checking hardware journal for a fault-tolerant processor architecture
International Journal of Reconfigurable Computing - Special issue on selected papers from the international workshop on reconfigurable communication-centric systems on chips (ReCoSoC' 2010)
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10
Aging-aware hardware-software task partitioning for reliable reconfigurable multiprocessor systems
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.00 |
Fault-tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault-tolerance. For a given job and a soft (transient) error probability, we define mathematical formulas for AET that includes bus communication overhead for both voting (active replication) and rollback-recovery with checkpointing (RRC). And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC, (2) finding the number of processors and job-to-processor assignment when using voting, and (3) defining fault-tolerance scheme (voting or RRC) per job and defining its usage for each job. Experiments demonstrate significant savings in AET.