Lightweight recoverable virtual memory
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior Under Faults
IEEE Transactions on Software Engineering - Special issue on software reliability
IBM experiments in soft fails in computer electronics (1978–1994)
IBM Journal of Research and Development - Special issue: terrestrial cosmic rays and soft errors
IBM Journal of Research and Development - Special issue: terrestrial cosmic rays and soft errors
The Rio file cache: surviving operating system crashes
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
S/390 cluster technology: Parallel Sysplex
IBM Systems Journal
In search of clusters (2nd ed.)
In search of clusters (2nd ed.)
The Java Programming Language
Increasing relevance of memory hardware errors: a case for recoverable programming models
EW 9 Proceedings of the 9th workshop on ACM SIGOPS European workshop: beyond the PC: new challenges for the operating system
SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
A Recoverable Distributed Shared Memory Integrating Coherence and Recoverability
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Analyzing heap error behavior in embedded JVM environments
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Susceptibility of Commodity Systems and Software to Memory Soft Errors
IEEE Transactions on Computers
Improving java virtual machine reliability for memory-constrained embedded systems
Proceedings of the 42nd annual Design Automation Conference
Object duplication for improving reliability
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Runtime integrity checking for inter-object connections
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
A JVM for soft-error-prone embedded systems
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Hi-index | 0.00 |
Modern computer systems are becoming more powerful and are using larger memories. However, except for very high end systems, little attention is being paid to high availability. This is particularly true for transient memory errors, which typically cause the entire system to fail. We believe that this situation can be improved by addressing memory errors at all levels of the system, bring commodity systems closer to mainframe-class availability. In this paper, we use fault injection experiments to investigate memory error susceptibility at the highest level using a JVM and four Java benchmark applications. We then consider JVM data structure checksums to increase detection of silent data corruption affecting the JVM and applications. Our results indicate that the JVM's heap area has a higher memory error susceptibility than its static data area and that we can detect up to 39% of all memory errors in the JVM and application. We believe that such techniques will allow commodity systems to be made much more robust and less error-prone to transient errors.