Fault-tolerance design of the IBM Enterprise System/9000 Type 9021 processors
IBM Journal of Research and Development
IBM experiments in soft fails in computer electronics (1978–1994)
IBM Journal of Research and Development - Special issue: terrestrial cosmic rays and soft errors
Field testing for cosmic ray soft errors in semiconductor memories
IBM Journal of Research and Development - Special issue: terrestrial cosmic rays and soft errors
Terrestrial cosmic ray intensities
IBM Journal of Research and Development
Symbol error correcting codes for memory applications
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
SEMM-2: a new generation of single-event-effect modeling tools
IBM Journal of Research and Development
New simulation methodology for effects of radiation in semiconductor chip structures
IBM Journal of Research and Development
Circuit design and modeling for soft errors
IBM Journal of Research and Development
Fault-tolerant design of the IBM pSeries 690 system using POWER4 processor technology
IBM Journal of Research and Development
RAS design for the IBM eServer z900
IBM Journal of Research and Development
Blue Gene/L compute chip: memory and Ethernet subsystem
IBM Journal of Research and Development
Virtualized and flexible ECC for main memory
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
IBM Journal of Research and Development
Review: A survey of memory error correcting techniques for improved reliability
Journal of Network and Computer Applications
System implications of memory reliability in exascale computing
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Hi-index | 0.00 |
While attention in the realm of computer design has shifted away from the classic DRAM soft-error rate (SER) and focused instead on SRAM and microprocessor latch sensitivities as sources of potential errors, DRAM SER nonetheless remains a challenging problem. This is true even though both cosmic ray-induced and alpha-particle-induced DRAM soft errors have been well modeled and, to a certain degree, well understood. However, the often-overlooked alignment of a DRAM hard error and a random soft error can have major reliability, availability, and serviceability (RAS) implications for systems that require an extremely long mean time between failures. The net of this effect is that what appears to be a well-behaved, single-bit soft error ends up overwhelming a seemingly state-of-the-art mitigation technique. This paper describes some of the history of DRAM soft-error discovery and the subsequent development of mitigation strategies. It then examines some architectural considerations that can exacerbate the effect of DRAM soft errors and may have system-level implications for today's standard fault-tolerance schemes.