Aging-aware hardware-software task partitioning for reliable reconfigurable multiprocessor systems

Authors:
Anup Das;Akash Kumar;Bharadwaj Veeravalli
Affiliations:
National University of Singapore, Singapore;National University of Singapore, Singapore;National University of Singapore, Singapore
Venue:
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Year:
2013

Citing 25
Cited 0

TGFF: task graphs for free

Proceedings of the 6th international workshop on Hardware/software codesign
Trends and Challenges in VLSI Circuit Reliability

IEEE Micro
NoC Synthesis Flow for Customized Domain Specific Multiprocessor Systems-on-Chip

IEEE Transactions on Parallel and Distributed Systems
A Dependability-Driven System-Level Design Approach for Embedded Systems

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Reliability-Centric Hardware/Software Co-Design

ISQED '05 Proceedings of the 6th International Symposium on Quality of Electronic Design
Reliability challenges for 45nm and beyond

Proceedings of the 43rd annual Design Automation Conference
SDF^3: SDF For Free

ACSD '06 Proceedings of the Sixth International Conference on Application of Concurrency to System Design
Reliability-aware Co-synthesis for Embedded Systems

Journal of VLSI Signal Processing Systems
Energy-aware cosynthesis of real-time multimedia applications on MPSoCs using heterogeneous scheduling policies

ACM Transactions on Embedded Computing Systems (TECS)
A Design Methodology for Application Partitioning and Architecture Development of Reconfigurable Multiprocessor Systems-on-Chip

FCCM '10 Proceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines
Task Mapping and Bandwidth Reservation for Mixed Hard/Soft Fault-Tolerant Embedded Systems

RTAS '10 Proceedings of the 2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium
Lifetime reliability-aware task allocation and scheduling for MPSoC platforms

Proceedings of the Conference on Design, Automation and Test in Europe
Fault-tolerant average execution time optimization for general-purpose multi-processor system-on-chips

Proceedings of the Conference on Design, Automation and Test in Europe
Analysis and optimization of fault-tolerant embedded systems with hardened processors

Proceedings of the Conference on Design, Automation and Test in Europe
A case for lifetime-aware task mapping in embedded chip multiprocessors

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

CIT '10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology
Reliability-Driven System-Level Synthesis of Embedded Systems

DFT '10 Proceedings of the 2010 IEEE 25th International Symposium on Defect and Fault Tolerance in VLSI Systems
Shared reconfigurable fabric for multi-core customization

Proceedings of the 48th Design Automation Conference
Reliability analysis for MPSoCs with mixed-critical, hard real-time constraints

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Analysis and optimization of fault-tolerant task scheduling on multiprocessor embedded systems

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A Novel Design Methodology for Implementing Reliability-Aware Systems on SRAM-Based FPGAs

IEEE Transactions on Computers
A study of DRAM failures in the field

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
When is multi-version checkpointing needed?

Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale
Reliability-driven task mapping for lifetime extension of networks-on-chip based multiprocessor systems

Proceedings of the Conference on Design, Automation and Test in Europe
Communication and migration energy aware design space exploration for multicore systems with intermittent faults

Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

Homogeneous multiprocessor systems with reconfigurable area (also known as Reconfigurable Multiprocessor Systems) are emerging as a popular design choice in current and future technology nodes to meet the heterogeneous computing demand of a multitude of applications enabled on these platforms. Application specific mapping decisions on such a platform involve partitioning a given application into software tasks (executed on one or more of the general purpose processors, GPPs) and the hardware tasks (realized as dedicated hardware on the reconfigurable area) to optimize and/or satisfy design constraints such as reliability, performance and design cost. Improving the reliability considering transient faults by increasing the number of checkpoints negatively impacts the reliability considering permanent faults. This trade-off is ignored in all prior studies on task mapping and scheduling. This paper proposes an optimization technique to decide the optimal number of checkpoints for the software tasks which minimizes aging of the GPPs while maximizing the transient fault-tolerance of the overall platform (GPPs and the reconfigurable area) and satisfying design cost and performance. Experiments conducted with synthetic and real-life application task graphs (cyclic and acyclic) demonstrate that the proposed technique minimizes aging and improves the platform lifetime by an average 60% as compared to the existing transient fault-aware techniques. Further, a gradient-based heuristic is proposed to minimize the design space exploration time by upto 500× with less than 5% deviation from optimal solution.