Testing semiconductor memories: theory and practice
Testing semiconductor memories: theory and practice
Using SimPoint for accurate and efficient simulation
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Testing of Digital Systems
DSD '07 Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Trading off Cache Capacity for Reliability to Enable Low Voltage Operation
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Memory Systems: Cache, DRAM, Disk
Memory Systems: Cache, DRAM, Disk
Architecting phase change memory as a scalable dram alternative
Proceedings of the 36th annual international symposium on Computer architecture
A durable and energy efficient main memory using phase change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Scalable high performance main memory system using phase-change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
ZerehCache: armoring cache architectures in high defect density technologies
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Dynamically replicated memory: building reliable systems from nanoscale resistive memories
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Virtualized and flexible ECC for main memory
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Use ECP, not ECC, for hard failures in resistive memories
Proceedings of the 37th annual international symposium on Computer architecture
Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
SAFER: Stuck-At-Fault Error Recovery for Memories
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Energy-efficient cache design using variable-strength error-correcting codes
Proceedings of the 38th annual international symposium on Computer architecture
FREE-p: Protecting non-volatile memory against both hard and soft errors
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Archipelago: A polymorphic cache design for enabling robust near-threshold operation
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
RAIDR: Retention-Aware Intelligent DRAM Refresh
Proceedings of the 39th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
DRAM scaling has been the prime driver for increasing the capacity of main memory system over the past three decades. Unfortunately, scaling DRAM to smaller technology nodes has become challenging due to the inherent difficulty in designing smaller geometries, coupled with the problems of device variation and leakage. Future DRAM devices are likely to experience significantly high error-rates. Techniques that can tolerate errors efficiently can enable DRAM to scale to smaller technology nodes. However, existing techniques such as row/column sparing and ECC become prohibitive at high error-rates. To develop cost-effective solutions for tolerating high error-rates, this paper advocates a cross-layer approach. Rather than hiding the faulty cell information within the DRAM chips, we expose it to the architectural level. We propose ArchShield, an architectural framework that employs runtime testing to identify faulty DRAM cells. ArchShield tolerates these faults using two components, a Fault Map that keeps information about faulty words in a cache line, and Selective Word-Level Replication (SWLR) that replicates faulty words for error resilience. Both Fault Map and SWLR are integrated in reserved area in DRAM memory. Our evaluations with 8GB DRAM DIMM show that ArchShield can efficiently tolerate error-rates as higher as 10−4 (100x higher than ECC alone), causes less than 2% performance degradation, and still maintains 1-bit error tolerance against soft errors.