Algorithms for Scheduling Imprecise Computations
Computer - Special issue on real-time systems
FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior Under Faults
IEEE Transactions on Software Engineering - Special issue on software reliability
Reliable and energy-efficient digital signal processing
Proceedings of the 39th annual Design Automation Conference
Concurrent Error Detection Using Watchdog Processors-A Survey
IEEE Transactions on Computers
A survey of techniques for energy efficient on-chip communication
Proceedings of the 40th annual Design Automation Conference
An ACS Robotic Control Algorithm with Fault Tolerant Capabilities
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Acceptability-oriented computing
OOPSLA '03 Companion of the 18th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Design and reliability challenges in nanometer technologies
Proceedings of the 41st annual Design Automation Conference
Multi-media Applications and Imprecise Computation
DSD '05 Proceedings of the 8th Euromicro Conference on Digital System Design
Ultra-efficient (embedded) SOC architectures based on probabilistic CMOS (PCMOS) technology
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Dynamic Derivation of Application-Specific Error Detectors and their Implementation in Hardware
EDCC '06 Proceedings of the Sixth European Dependable Computing Conference
An Analysis Framework for Transient-Error Tolerance
VTS '07 Proceedings of the 25th IEEE VLSI Test Symmposium
Algorithm-Based Fault Tolerance for Matrix Operations
IEEE Transactions on Computers
Trading off Cache Capacity for Reliability to Enable Low Voltage Operation
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A case study in reliability-aware design: a resilient LDPC code decoder
Proceedings of the conference on Design, automation and test in Europe
Globally optimized robust systems to overcome scaled CMOS reliability challenges
Proceedings of the conference on Design, automation and test in Europe
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Proceedings of the 47th Design Automation Conference
Best-effort computing: re-thinking parallel software and hardware
Proceedings of the 47th Design Automation Conference
What is stochastic computation?
ACM SIGDA Newsletter
Cross-layer resilience challenges: metrics and optimization
Proceedings of the Conference on Design, Automation and Test in Europe
Exploring the fidelity-efficiency design space using imprecise arithmetic
Proceedings of the 16th Asia and South Pacific Design Automation Conference
EnerJ: approximate data types for safe and general low-power computation
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Significance driven computation on next-generation unreliable platforms
Proceedings of the 48th Design Automation Conference
Dynamic effort scaling: managing the quality-efficiency tradeoff
Proceedings of the 48th Design Automation Conference
Design and architectures for dependable embedded systems
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
MACACO: modeling and analysis of circuits for approximate computing
Proceedings of the International Conference on Computer-Aided Design
Cross-layer error resilience for robust systems
Proceedings of the International Conference on Computer-Aided Design
Architecture support for disciplined approximate programming
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
A case study on error resilient architectures for wireless communication
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
SALSA: systematic logic synthesis of approximate circuits
Proceedings of the 49th Annual Design Automation Conference
Incorrect systems: it's not the problem, it's the solution
Proceedings of the 49th Annual Design Automation Conference
Fault resilience of the algebraic multi-grid solver
Proceedings of the 26th ACM international conference on Supercomputing
Selectively fortifying reconfigurable computing device to achieve higher error resilience
Journal of Electrical and Computer Engineering - Special issue on ESL Design Methodology
Proceedings of the 7th International Conference on Body Area Networks
Neural Acceleration for General-Purpose Approximate Programs
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Ten Years of Building Broken Chips: The Physics and Engineering of Inexact Computing
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
Managing the Quality vs. Efficiency Trade-off Using Dynamic Effort Scaling
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
Synthesizing Parsimonious Inexact Circuits through Probabilistic Design Techniques
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
Extracting useful computation from error-prone processors for streaming applications
Proceedings of the Conference on Design, Automation and Test in Europe
Hierarchically focused guardbanding: an adaptive approach to mitigate PVT variations and aging
Proceedings of the Conference on Design, Automation and Test in Europe
Reliable on-chip systems in the nano-era: lessons learnt and future trends
Proceedings of the 50th Annual Design Automation Conference
Analysis and characterization of inherent application resilience for approximate computing
Proceedings of the 50th Annual Design Automation Conference
A hybrid HW-SW approach for intermittent error mitigation in streaming-based embedded systems
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Verifying quantitative reliability for programs that execute on unreliable hardware
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Quality programmable vector processors for approximate computing
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Approximate storage in solid-state memories
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
On reconfiguration-oriented approximate adder design and its application
Proceedings of the International Conference on Computer-Aided Design
A survey of cross-layer power-reliability tradeoffs in multi and many core systems-on-chip
Microprocessors & Microsystems
Hi-index | 0.00 |
There is a growing concern about the increasing vulnerability of future computing systems to errors in the underlying hardware. Traditional redundancy techniques are expensive for designing energy-efficient systems that are resilient to high error rates. We present Error Resilient System Architecture (ERSA), a low-cost robust system architecture for emerging killer probabilistic applications such as Recognition, Mining and Synthesis (RMS) applications. While resilience of such applications to errors in low-order bits of data is well-known, execution of such applications on error-prone hardware significantly degrades output quality (due to high-order bit errors and crashes). ERSA achieves high error resilience to high-order bit errors and control errors (in addition to low-order bit errors) using a judicious combination of 3 key ideas: (1) asymmetric reliability in many-core architectures, (2) error-resilient algorithms at the core of probabilistic applications, and (3) intelligent software optimizations. Error injection experiments on a multi-core ERSA hardware prototype demonstrate that, even at very high error rates of 20,000 errors/second/core or 2x10-4 error/cycle/core (with errors injected in architecturally-visible registers), ERSA maintains 90% or better accuracy of output results, together with minimal impact on execution time, for probabilistic applications such as K-Means clustering, LDPC decoding and Bayesian networks. Moreover, we demonstrate the effectiveness of ERSA in tolerating high rates of static memory errors that are characteristic of emerging challenges such as Vccmin problems and erratic bit errors. Using the concept of configurable reliability, ERSA platforms may also be adapted for general-purpose applications that are less resilient to errors (but at higher costs).