IBM experiments in soft fails in computer electronics (1978–1994)
IBM Journal of Research and Development - Special issue: terrestrial cosmic rays and soft errors
IBM Journal of Research and Development - Special issue: terrestrial cosmic rays and soft errors
Detailed design and evaluation of redundant multithreading alternatives
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Chromium: a stream-processing framework for interactive rendering on clusters
Proceedings of the 29th annual conference on Computer graphics and interactive techniques
The Impact of Technology Scaling on Lifetime Reliability
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
The Soft Error Problem: An Architectural Perspective
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Dynamic prediction of architectural vulnerability from microarchitectural state
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Mixed-mode multicore reliability
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Understanding software approaches for GPGPU reliability
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Dynamic heterogeneity and the need for multicore virtualization
ACM SIGOPS Operating Systems Review
Using hardware vulnerability factors to enhance AVF analysis
Proceedings of the 37th annual international symposium on Computer architecture
Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Hi-index | 0.00 |
With shrinking process technology, the primary cause of transient faults in semiconductors shifts away from high-energy cosmic particle strikes and toward more mundane and pervasive causes---power fluctuations, crosstalk, and other random noise. Smaller transistor features require a lower critical charge to hold and change bits, which leads to faster microprocessors, but which also leads to higher transient fault rates. Current trends, expected to continue, show soft error rates increasing exponentially at a rate of 8% per technology generation. Existing transient fault research in general-purpose architecture, like the well-established architectural vulnerability factor (AVF), assume that all computations are equally important and all errors equally intolerable. However, we observe that the effect of transient faults in graphics processing can range from imperceptible, to bothersome visual artifacts, to critical loss of function. We therefore extend and generalize the AVF by introducing the Visual Vulnerability Spectrum (VVS). We apply the VVS to analyze the effect of increased transient error rate on graphics processors. With this analysis in hand, we suggest several targeted, inexpensive solutions that can mitigate the most egregious of soft error consequences.