Symbol error correcting codes for memory applications
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
The IBM PCIXCC: a new cryptographic coprocessor for the IBM eServer
IBM Journal of Research and Development
Reliability, availability, and serviceability (RAS) of the IBM eServer z990
IBM Journal of Research and Development
IBM System z9 eFUSE applications and methodology
IBM Journal of Research and Development
Cryptographic system enhancements for the IBM System z9
IBM Journal of Research and Development
Reducing planned outages for book hardware maintenance with concurrent book replacement
IBM Journal of Research and Development
Phaser: phased methodology for modeling the system-level effects of soft errors
IBM Journal of Research and Development
System RAS implications of DRAM soft errors
IBM Journal of Research and Development
Error-correcting codes for semiconductor memory applications: a state-of-the-art review
IBM Journal of Research and Development
Design and microarchitecture of the IBM system z10 microprocessor
IBM Journal of Research and Development
Structural and functional test of IBM system z10 chips
IBM Journal of Research and Development
Packaging design of the IBM system z10 enterprise class platform central electronic complex
IBM Journal of Research and Development
Autonomic computing and IBM system z10 active resource monitoring
IBM Journal of Research and Development
Capacity on Demand advancements on the IBM system z10
IBM Journal of Research and Development
Structural and functional test of IBM system z10 chips
IBM Journal of Research and Development
Packaging design of the IBM system z10 enterprise class platform central electronic complex
IBM Journal of Research and Development
Scalable and modular pervasive logic/firmware design
IBM Journal of Research and Development
Hi-index | 0.00 |
The IBM System z10™ server reliability, availability, and serviceability (RAS) design continues to reduce the sources of server outages through innovative RAS architecture and techniques. The z10™ server introduced functional improvements that challenged the RAS design. Increases were made in the performance of each processor, the total number of processors, the total size of the memory, the amount of cache, the bandwidth of the I/O, the thermal density, and the exposure to soft errors. These changes demanded stronger RAS functions to prevent unscheduled outages. Significant improvements were made to the IBM e-business on demand® functions (concurrent, customer-requested upgrades) that enable customers to better manage capacity without having to take planned outages. The hypervisor simplified configuration changes, such as adding cryptography or channel subsystems to logical partitions, by eliminating the need for preplanning. Single-core checkstopping and single transparent CPU (central processing unit) sparing were added. The RAS functions reduced the number of scheduled outages. Product improvements were complemented by improvements in RAS modeling. This paper describes these RAS improvements and how they provide value to the customer.