A Computational Approach to Edge Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Reliable computer systems (3rd ed.): design and evaluation
Reliable computer systems (3rd ed.): design and evaluation
An Architectural Framework for Providing Reliability and Security Support
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
A Survey of Outlier Detection Methodologies
Artificial Intelligence Review
Algorithm-Based Fault Tolerance for Matrix Operations
IEEE Transactions on Computers
The Daikon system for dynamic detection of likely invariants
Science of Computer Programming
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Feature Extraction & Image Processing, Second Edition
Feature Extraction & Image Processing, Second Edition
Hauberk: Lightweight Silent Data Corruption Error Detector for GPGPU
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Algorithmic approaches to low overhead fault detection for sparse linear algebra
DSN '12 Proceedings of the 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
DSN '13 Proceedings of the 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
Hi-index | 0.00 |
In this paper, we propose automated algorithmic error resilience based on outlier detection. Our approach exploits the characteristic behavior of a class of applications to create metric functions that normally produce metric values according to a designed distribution or behavior and produce outlier values (i.e., values that do not conform to the designed distribution or behavior) when computations are affected by errors. For a robust algorithm that employs such an approach, error detection becomes equivalent to outlier detection. As such, we can make use of well-established, statistically rigorous techniques for outlier detection to effectively and efficiently detect errors, and subsequently correct them. Our error-resilient algorithms incur significantly lower overhead than traditional hardware and software error resilience techniques. Also, compared to previous approaches to application-based error resilience, our approaches parameterize the robustification process, making it easy to automatically transform large classes of applications into robust applications with the use of parser-based tools and minimal programmer effort. We demonstrate the use of automated error resilience based on outlier detection for structured grid problems, leveraging the flexibility of algorithmic error resilience to achieve improved application robustness and lower overhead compared to previous error resilience approaches. We demonstrate 2 × --3× improvement in output quality compared to the original algorithm with only 22% overhead, on average, for non-iterative structured grid problems. Average overhead is as low as 4.5% for error-resilient iterative structured grid algorithms that tolerate error rates up to 10E-3 and achieve the same output quality as their error-free counterparts.