Managing power consumption in networks on chips
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Thermal Modeling, Characterization and Management of On-Chip Networks
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Techniques for Multicore Thermal Management: Classification and New Exploration
Proceedings of the 33rd annual international symposium on Computer Architecture
The M5 Simulator: Modeling Networked Systems
IEEE Micro
Lifetime reliability aware microprocessors
Lifetime reliability aware microprocessors
A Framework for Architecture-Level Lifetime Reliability Modeling
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
An Analytical Model for Reliability Evaluation of NoC Architectures
IOLTS '07 Proceedings of the 13th IEEE International On-Line Testing Symposium
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
Impact of Process and Temperature Variations on Network-on-Chip Design Exploration
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Application-specific MPSoC reliability optimization
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Facelift: Hiding and slowing down aging in multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
NBTI aware workload balancing in multi-core systems
ISQED '09 Proceedings of the 2009 10th International Symposium on Quality of Electronic Design
Multi-mechanism reliability modeling and management in dynamic systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Temperature-constrained power control for chip multiprocessors with online model estimation
Proceedings of the 36th annual international symposium on Computer architecture
Dynamic thermal management via architectural adaptation
Proceedings of the 46th Annual Design Automation Conference
Reliability limits for the gate insulator in CMOS technology
IBM Journal of Research and Development
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Utilizing predictors for efficient thermal management in multiprocessor SoCs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Computer Networks: The International Journal of Computer and Telecommunications Networking
Adaptive and autonomous thermal tracking for high performance computing systems
Proceedings of the 47th Design Automation Conference
Designing heterogeneous embedded network-on-chip platforms with users in mind
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Process variation and temperature-aware reliability management
Proceedings of the Conference on Design, Automation and Test in Europe
Vision for cross-layer optimization to address the dual challenges of energy and reliability
Proceedings of the Conference on Design, Automation and Test in Europe
A self-adaptive system architecture to address transistor aging
Proceedings of the Conference on Design, Automation and Test in Europe
System-level reliability modeling for MPSoCs
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Run-time adaptable on-chip thermal triggers
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Hotspot: acompact thermal modeling methodology for early-stage VLSI design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Energy and reliability oriented mapping for regular Networks-on-Chip
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
ACM SIGARCH Computer Architecture News
Proceedings of the 48th Design Automation Conference
Maestro: orchestrating lifetime reliability in chip multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
NOCS '12 Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip
Unified system level reliability evaluation methodology for multiprocessor Systems-on-Chip
IGCC '12 Proceedings of the 2012 International Green Computing Conference (IGCC)
A multi-agent framework for thermal aware task migration in many-core systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
HAPPE: Human and Application-Driven Frequency Scaling for Processor Power Efficiency
IEEE Transactions on Mobile Computing
Hi-index | 0.00 |
We present a new architecture level unified reliability evaluation methodology for chip multiprocessors (CMPs). The proposed reliability estimation (REST) is based on a Monte Carlo algorithm. What distinguishes REST from the previous work is that both the computational and communication components are considered in a unified manner to compute the reliability of the CMP. We utilize REST tool to develop a new dynamic reliability management (DRM) scheme to address time-dependent dielectric breakdown and negative-bias temperature instability aging mechanisms in network-on-chip (NoC) based CMPs. Designed as a control loop, the proposed DRM scheme uses an effective neural network based reliability estimation module. The neural-network predictor is trained using the REST tool. We investigate how system's lifetime changes when the NoC as the communication unit of the CMP is considered or not during the reliability evaluation process and find that differences can be as high as 60%. Full-system based simulations using a customized GEM5 simulator show that reliability can be improved by up to 52% using the proposed DRM scheme in a best-effort scenario with 2-9% performance penalty (using a user set target lifetime of 7years) over the case when no DRM is employed.