Reliable computer systems (2nd ed.): design and evaluation
Reliable computer systems (2nd ed.): design and evaluation
S/390 cluster technology: Parallel Sysplex
IBM Systems Journal
S/390 CMOS server I/O: the continuing evolution
IBM Journal of Research and Development - Special issue: IBM S/390 G3 and G4
A high-frequency custom CMOS S/390 microprocessor
IBM Journal of Research and Development - Special issue: IBM S/390 G3 and G4
The nucleus of a multiprogramming system
Communications of the ACM
The structure of the “THE”-multiprogramming system
Communications of the ACM
Architecture and Dependability of Large-Scale Internet Services
IEEE Internet Computing
TNet: A Reliable System Area Network
IEEE Micro
Software Dependability in the Tandem GUARDIAN System
IEEE Transactions on Software Engineering
The Vision of Autonomic Computing
Computer
Impact of Deep Submicron Technology on Dependability of VLSI Circuits
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
How Fail-Stop are Faulty Programs?
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
IBM Journal of Research and Development
IBM S/390 parallel enterprise server G5 fault tolerance: a historical perspective
IBM Journal of Research and Development
Proceedings of the twentieth ACM symposium on Operating systems principles
Ensuring data integrity in storage: techniques and applications
Proceedings of the 2005 ACM workshop on Storage security and survivability
A fresh look at the reliability of long-term digital storage
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Zyzzyva: speculative byzantine fault tolerance
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Improving file system reliability with I/O shepherding
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Exploiting type-awareness in a self-recovering disk
Proceedings of the 2007 ACM workshop on Storage security and survivability
The effects of metadata corruption on nfs
Proceedings of the 2007 ACM workshop on Storage security and survivability
Parity lost and parity regained
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
An analysis of data corruption in the storage stack
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Globally optimized robust systems to overcome scaled CMOS reliability challenges
Proceedings of the conference on Design, automation and test in Europe
An analysis of data corruption in the storage stack
ACM Transactions on Storage (TOS)
The StageNet fabric for constructing resilient multicore systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
Sequential element design with built-in soft error resilience
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
System-on-Chip Test Architectures: Nanometer Design for Testability
System-on-Chip Test Architectures: Nanometer Design for Testability
Architecture Design for Soft Errors
Architecture Design for Soft Errors
On soft error rate analysis of scaled CMOS designs: a statistical perspective
Proceedings of the 2009 International Conference on Computer-Aided Design
Shoestring: probabilistic soft error reliability on the cheap
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Necromancer: enhancing system throughput by animating dead cores
Proceedings of the 37th annual international symposium on Computer architecture
End-to-end data integrity for file systems: a ZFS case study
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
CuriOS: improving reliability through operating system structure
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Erasing Core Boundaries for Robust and Configurable Performance
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A file is not a file: understanding the I/O behavior of Apple desktop applications
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Statistical Soft Error Rate (SSER) Analysis for Scaled CMOS Designs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Towards reliable storage systems
Towards reliable storage systems
Efficient soft error protection for commodity embedded microprocessors using profile information
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
A File Is Not a File: Understanding the I/O Behavior of Apple Desktop Applications
ACM Transactions on Computer Systems (TOCS)
Viper: virtual pipelines for enhanced reliability
Proceedings of the 39th Annual International Symposium on Computer Architecture
Low cost control flow protection using abstract control signatures
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
The Journal of Supercomputing
Ffsck: The Fast File-System Checker
ACM Transactions on Storage (TOS)
Ffsck: the fast file system checker
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
HARDFS: hardening HDFS with selective and lightweight versioning
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
This paper compares and contrasts the design philosophies and implementations of two computer system families: the IBM S/360 and its evolution to the current zSeries line, and the Tandem (now HP) NonStop® Server. Both systems have a long history; the initial IBM S/360 machines were shipped in 1964, and the Tandem NonStop System was first shipped in 1976. They were aimed at similar markets, what would today be called enterprise-class applications. The requirement for the original S/360 line was for very high availability; the requirement for the NonStop platform was for single fault tolerance against unplanned outages. Since their initial shipments, availability expectations for both platforms have continued to rise and the system designers and developers have been challenged to keep up. There were and still are many similarities in the design philosophies of the two lines, including the use of redundant components and extensive error checking. The primary difference is that the S/360-zSeries focus has been on localized retry and restore to keep processors functioning as long as possible, while the NonStop developers have based systems on a loosely coupled multiprocessor design that supports a "fail-fast驴 philosophy implemented through a combination of hardware and software, with workload being actively taken over by another resource when one fails.