Diagnosis and Repair of Memory with Coupling Faults
IEEE Transactions on Computers
Understanding the Linux Virtual Memory Manager
Understanding the Linux Virtual Memory Manager
Software Based In-System Memory Test for Highly Available Systems
MTDT '05 Proceedings of the 2005 IEEE International Workshop on Memory Technology, Design, and Testing
The slab allocator: an object-caching kernel memory allocator
USTC'94 Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference - Volume 1
Detection oF Pattern-Sensitive Faults in Random-Access Memories
IEEE Transactions on Computers
Fault Location in a Semiconductor Random-Access Memory Unit
IEEE Transactions on Computers
Efficient Algorithms for Testing Semiconductor Random-Access Memories
IEEE Transactions on Computers
DRAM errors in the wild: a large-scale field study
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Virtualized and flexible ECC for main memory
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Cycles, cells and platters: an empirical analysisof hardware failures on a million consumer PCs
Proceedings of the sixth conference on Computer systems
RAMpage: Graceful Degradation Management for Memory Errors in Commodity Linux Servers
PRDC '11 Proceedings of the 2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing
COMeT: Continuous Online Memory Test
PRDC '11 Proceedings of the 2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing
Practical scrubbing: Getting to the bad sector at the right time
DSN '12 Proceedings of the 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
Hi-index | 0.00 |
Memory errors are a major source of reliability problems in computer systems. Undetected errors may result in program termination or, even worse, silent data corruption. Recent studies have shown that the frequency of permanent memory errors is an order of magnitude higher than previously assumed and regularly affects everyday operation. To reduce the impact of memory errors, we designed RAMpage, a purely software-based infrastructure to assess and circumvent permanent memory errors in a running commodity x86-64 Linux-based system. We briefly describe the design and implementation of RAMpage and present new results from an extensive qualitative and quantitative evaluation. These results show the efficiency of our approach - RAMpage is able to provide a smooth graceful degradation in the presence of permanent memory errors while requiring only a small overhead in terms of CPU time, energy, and memory space.