Multiprocessor Organization—a Survey
ACM Computing Surveys (CSUR)
The nucleus of a multiprogramming system
Communications of the ACM
The structure of the “THE”-multiprogramming system
Communications of the ACM
Computer Structures: Principles and Examples
Computer Structures: Principles and Examples
Tailor: A simple model that works
SIGMETRICS '79 Proceedings of the 1979 ACM SIGMETRICS conference on Simulation, measurement and modeling of computer systems
XRAY: Instrumentation for multiple computers
PERFORMANCE '80 Proceedings of the 1980 international symposium on Computer performance modelling, measurement and evaluation
Analysis of a composite performance reliability measure for fault-tolerant systems
Journal of the ACM (JACM)
The alchemy model: a model for homogeneous and heterogeneous distributed computing system
ACM SIGOPS Operating Systems Review
On coupling many small systems for transaction processing
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
IBM Systems Journal
A weighted voting algorithm for replicated directories
Journal of the ACM (JACM)
Progressive transaction recovery in distributed DB/DC systems
IEEE Transactions on Computers - Special Issue on Real-Time Systems
Recovery management in QuickSilver
ACM Transactions on Computer Systems (TOCS)
Distributed logging for transaction processing
SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
A benchmark of NonStop SQL on the debit credit transaction
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
High performance SQL through low-level system integration
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
ACM Transactions on Computer Systems (TOCS)
Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Fault-tolerant computing based on Mach
ACM SIGOPS Operating Systems Review
Understanding fault-tolerant distributed systems
Communications of the ACM
Performance of a mirrored disk in a real-time transaction system
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
On robust transaction routing and load sharing
ACM Transactions on Database Systems (TODS)
Replication in the harp file system
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
An implementation for small databases with high availability
ACM SIGOPS Operating Systems Review
High-Availability Computer Systems
Computer
Effect of Fault Tolerance on Response Time-Analysis of the Primary Site Approach
IEEE Transactions on Computers
VAXcluster: a closely-coupled distributed system
ACM Transactions on Computer Systems (TOCS)
Hypervisor-based fault tolerance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Hypervisor-based fault tolerance
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Design and Evaluation of a Window-Consistent Replication Service
IEEE Transactions on Computers
Cluster-based scalable network services
Proceedings of the sixteenth ACM symposium on Operating systems principles
Progressive Retry for Software Failure Recovery in Message-Passing Applications
IEEE Transactions on Computers
Efficient transparent application recovery in client-server information systems
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Interface and execution models in the Fluke kernel
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Fast cluster failover using virtual memory-mapped communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
A Real-Time Primary-Backup Replication Service
IEEE Transactions on Parallel and Distributed Systems
Reconfiguration Models and Algorithms for Stateful Interactive Processes
IEEE Transactions on Software Engineering
Replicated distributed programs
Proceedings of the tenth ACM symposium on Operating systems principles
Replication and fault-tolerance in the ISIS system
Proceedings of the tenth ACM symposium on Operating systems principles
Transactions and synchronization in a distributed operating system
Proceedings of the tenth ACM symposium on Operating systems principles
Replication in distributed systems: the Eden experience
ACM '86 Proceedings of 1986 ACM Fall joint computer conference
An architecture for high volume transaction processing
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Performing remote operations efficiently on a local computer network
Communications of the ACM
Increasing relevance of memory hardware errors: a case for recoverable programming models
EW 9 Proceedings of the 9th workshop on ACM SIGOPS European workshop: beyond the PC: new challenges for the operating system
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Reliability Through Consistency
IEEE Software
Tradeoffs Between Coupling Small and Large Processors for Transaction Processing
IEEE Transactions on Computers
A taxonomy of scheduling in general-purpose distributed computing systems
IEEE Transactions on Software Engineering
Resource Allocation for Primary-Site Fault-Tolerant Systems
IEEE Transactions on Software Engineering
Software Bottlenecking in Client-Server Systems and Rendezvous Networks
IEEE Transactions on Software Engineering
Efficient Testing of High Performance Transaction Processing Systems
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Robustness to Crash in a Distributed Database: A Non Shared-memory Multi-Processor Approach
VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Integrating Reliable Memory in Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
An Efficient Universal Construction for Message-Passing Systems
DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Integrating reliable memory in databases
The VLDB Journal — The International Journal on Very Large Data Bases
MicroTAL - a machine-dependent, high-level microprogramming language
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
The LOCUS distributed operating system
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
A message system supporting fault tolerance
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Publishing: a reliable broadcast communication mechanism
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
PODC '84 Proceedings of the third annual ACM symposium on Principles of distributed computing
Highly Available Process Support Systems: Implementing Backup Mechanisms
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Evaluation of Software Dependability Based on Stability Test Data
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
A Flexible ServerNet-Based Fault-Tolerant Architecture
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Implementing Fault-Tolerant Applications Using Reflective Object-Oriented Programming
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Improving Logging and Recovery Performance in Phoenix/App
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Improving availability with recursive microreboots: a soft-state system case study
Performance Evaluation - Dependable systems and networks-performance and dependability symposium (DSN-PDS) 2002: Selected papers
Recovery guarantees for Internet applications
ACM Transactions on Internet Technology (TOIT)
Commercial Fault Tolerance: A Tale of Two Systems
IEEE Transactions on Dependable and Secure Computing
Susceptibility of Commodity Systems and Software to Memory Soft Errors
IEEE Transactions on Computers
Rx: treating bugs as allergies---a safe method to survive software failures
Proceedings of the twentieth ACM symposium on Operating systems principles
Autonomous recovery in componentized Internet applications
Cluster Computing
ACM Transactions on Computer Systems (TOCS)
Exploring failure transparency and the limits of generic recovery
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
JVM susceptibility to memory errors
JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Rx: Treating bugs as allergies—a safe method to survive software failures
ACM Transactions on Computer Systems (TOCS)
The transaction concept: virtues and limitations (invited paper)
VLDB '81 Proceedings of the seventh international conference on Very Large Data Bases - Volume 7
Transaction monitoring in ENCOMPASS: reliable distributed transaction processing
VLDB '81 Proceedings of the seventh international conference on Very Large Data Bases - Volume 7
Tolerating byzantine faults in transaction processing systems using commit barrier scheduling
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Diverse replication for single-machine Byzantine-fault tolerance
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Implementing high availability memory with a duplication cache
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Tolerating hardware device failures in software
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
System support for scalable and fault tolerant internet services
Middleware '98 Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing
CuriOS: improving reliability through operating system structure
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Reliable distributed data stream management in mobile environments
Information Systems
Managing self-inflicted nondeterminism
HotDep'05 Proceedings of the First conference on Hot topics in system dependability
Evaluating the viability of process replication reliability for exascale systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Unstoppable stateful PHP web services
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Efficient and coordinated checkpointing for reliable distributed data stream management
ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
Research: Designing a system infrastructure for distributed programs
Computer Communications
Operating system support for redundant multithreading
Proceedings of the tenth ACM international conference on Embedded software
A survey of checker architectures
ACM Computing Surveys (CSUR)
Hi-index | 0.05 |
The Tandem NonStop System is a fault-tolerant [1], expandable, and distributed computer system designed expressly for online transaction processing. This paper describes the key primitives of the kernel of the operating system. The first section describes the basic hardware building blocks and introduces their software analogs: processes and messages. Using these primitives, a mechanism that allows fault-tolerant resource access, the process-pair, is described. The paper concludes with some observations on this type of system structure and on actual use of the system.