Design Methodology for a One-Shot Reed-Solomon Encoder and Decoder
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
The M5 Simulator: Modeling Networked Systems
IEEE Micro
Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Overview of the IBM Blue Gene/P project
IBM Journal of Research and Development
DRAM errors in the wild: a large-scale field study
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Future scaling of processor-memory interfaces
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Rethinking DRAM design and organization for energy-constrained multi-cores
Proceedings of the 37th annual international symposium on Computer architecture
CPPC: correctable parity protected cache
Proceedings of the 38th annual international symposium on Computer architecture
Benchmarking modern multiprocessors
Benchmarking modern multiprocessors
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Pay-As-You-Go: low-overhead hard-error correction for phase change memories
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
LOT-ECC: localized and tiered reliability mechanisms for commodity memory systems
Proceedings of the 39th Annual International Symposium on Computer Architecture
A software memory partition approach for eliminating bank-level interference in multicore systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
A study of DRAM failures in the field
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Adaptive Reliability Chipkill Correct (ARCC)
HPCA '13 Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
Hi-index | 0.00 |
Due to their large memory capacities, many modern servers require chipkill correct, an advanced type of memory error detection and correction, to meet their reliability requirements. However, existing chipkill-correct solutions incur high power or storage overheads, or both because they use dedicated error-correction resources per codeword to perform error correction. This requires high overhead for correction and results in high overhead for error detection. We propose a novel chipkill-correct solution, multi-line error correction, that uses resources shared across multiple lines in memory for error correction to reduce the overhead of both error detection and correction. Our evaluations show that the proposed solution reduces memory power by a mean of 27%, and up to 38% with respect to commercial solutions, at a cost of 0.4% increase in storage overhead and minimal impact on reliability.