A NonStop kernel

Authors:
Joel F. Bartlett
Affiliations:
-
Venue:
SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
Year:
1981

Citing 6
Cited 89

Multiprocessor Organization—a Survey

ACM Computing Surveys (CSUR)
The nucleus of a multiprogramming system

Communications of the ACM
The structure of the “THE”-multiprogramming system

Communications of the ACM
Computer Structures: Principles and Examples

Computer Structures: Principles and Examples
Tailor: A simple model that works

SIGMETRICS '79 Proceedings of the 1979 ACM SIGMETRICS conference on Simulation, measurement and modeling of computer systems
XRAY: Instrumentation for multiple computers

PERFORMANCE '80 Proceedings of the 1980 international symposium on Computer performance modelling, measurement and evaluation

Analysis of a composite performance reliability measure for fault-tolerant systems

Journal of the ACM (JACM)
The alchemy model: a model for homogeneous and heterogeneous distributed computing system

ACM SIGOPS Operating Systems Review
On coupling many small systems for transaction processing

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Database technology

IBM Systems Journal
A weighted voting algorithm for replicated directories

Journal of the ACM (JACM)
Progressive transaction recovery in distributed DB/DC systems

IEEE Transactions on Computers - Special Issue on Real-Time Systems
Recovery management in QuickSilver

ACM Transactions on Computer Systems (TOCS)
Distributed logging for transaction processing

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Sequoia: A Fault-Tolerant Tightly Coupled Multiprocessor for Transaction Processing

Computer
A benchmark of NonStop SQL on the debit credit transaction

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
High performance SQL through low-level system integration

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Fault tolerance under UNIX

ACM Transactions on Computer Systems (TOCS)
Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems

PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Fault-tolerant computing based on Mach

ACM SIGOPS Operating Systems Review
Understanding fault-tolerant distributed systems

Communications of the ACM
Performance of a mirrored disk in a real-time transaction system

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
On robust transaction routing and load sharing

ACM Transactions on Database Systems (TODS)
Replication in the harp file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
An implementation for small databases with high availability

ACM SIGOPS Operating Systems Review
High-Availability Computer Systems

Computer
Effect of Fault Tolerance on Response Time-Analysis of the Primary Site Approach

IEEE Transactions on Computers
VAXcluster: a closely-coupled distributed system

ACM Transactions on Computer Systems (TOCS)
Hypervisor-based fault tolerance

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Hypervisor-based fault tolerance

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Design and Evaluation of a Window-Consistent Replication Service

IEEE Transactions on Computers
Cluster-based scalable network services

Proceedings of the sixteenth ACM symposium on Operating systems principles
Progressive Retry for Software Failure Recovery in Message-Passing Applications

IEEE Transactions on Computers
Efficient transparent application recovery in client-server information systems

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Interface and execution models in the Fluke kernel

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Fast cluster failover using virtual memory-mapped communication

ICS '99 Proceedings of the 13th international conference on Supercomputing
A Real-Time Primary-Backup Replication Service

IEEE Transactions on Parallel and Distributed Systems
Reconfiguration Models and Algorithms for Stateful Interactive Processes

IEEE Transactions on Software Engineering
Replicated distributed programs

Proceedings of the tenth ACM symposium on Operating systems principles
Replication and fault-tolerance in the ISIS system

Proceedings of the tenth ACM symposium on Operating systems principles
Transactions and synchronization in a distributed operating system

Proceedings of the tenth ACM symposium on Operating systems principles
Replication in distributed systems: the Eden experience

ACM '86 Proceedings of 1986 ACM Fall joint computer conference
An architecture for high volume transaction processing

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Performing remote operations efficiently on a local computer network

Communications of the ACM
Increasing relevance of memory hardware errors: a case for recoverable programming models

EW 9 Proceedings of the 9th workshop on ACM SIGOPS European workshop: beyond the PC: new challenges for the operating system
A survey of rollback-recovery protocols in message-passing systems

ACM Computing Surveys (CSUR)
Processor and Memory-Based Checkpoint and Rollback Recovery

Computer
Reliability Through Consistency

IEEE Software
Tradeoffs Between Coupling Small and Large Processors for Transaction Processing

IEEE Transactions on Computers
A taxonomy of scheduling in general-purpose distributed computing systems

IEEE Transactions on Software Engineering
Resource Allocation for Primary-Site Fault-Tolerant Systems

IEEE Transactions on Software Engineering
Software Bottlenecking in Client-Server Systems and Rendezvous Networks

IEEE Transactions on Software Engineering
Efficient Testing of High Performance Transaction Processing Systems

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Robustness to Crash in a Distributed Database: A Non Shared-memory Multi-Processor Approach

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
Using Write Protected Data Structures To Improve Software Fault Tolerance in Highly Available Database Management Systems

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Integrating Reliable Memory in Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
An Efficient Universal Construction for Message-Passing Systems

DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Integrating reliable memory in databases

The VLDB Journal — The International Journal on Very Large Data Bases
MicroTAL - a machine-dependent, high-level microprogramming language

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
The LOCUS distributed operating system

SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
A message system supporting fault tolerance

SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Publishing: a reliable broadcast communication mechanism

SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Replicated procedure call

PODC '84 Proceedings of the third annual ACM symposium on Principles of distributed computing
Highly Available Process Support Systems: Implementing Backup Mechanisms

SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Evaluation of Software Dependability Based on Stability Test Data

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
A Flexible ServerNet-Based Fault-Tolerant Architecture

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Implementing Fault-Tolerant Applications Using Reflective Object-Oriented Programming

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Improving Logging and Recovery Performance in Phoenix/App

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Improving availability with recursive microreboots: a soft-state system case study

Performance Evaluation - Dependable systems and networks-performance and dependability symposium (DSN-PDS) 2002: Selected papers
Recovery guarantees for Internet applications

ACM Transactions on Internet Technology (TOIT)
Commercial Fault Tolerance: A Tale of Two Systems

IEEE Transactions on Dependable and Secure Computing
Susceptibility of Commodity Systems and Software to Memory Soft Errors

IEEE Transactions on Computers
Rx: treating bugs as allergies---a safe method to survive software failures

Proceedings of the twentieth ACM symposium on Operating systems principles
Autonomous recovery in componentized Internet applications

Cluster Computing
Recovering device drivers

ACM Transactions on Computer Systems (TOCS)
Exploring failure transparency and the limits of generic recovery

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Recovering device drivers

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
JVM susceptibility to memory errors

JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Rx: Treating bugs as allergies—a safe method to survive software failures

ACM Transactions on Computer Systems (TOCS)
The transaction concept: virtues and limitations (invited paper)

VLDB '81 Proceedings of the seventh international conference on Very Large Data Bases - Volume 7
Transaction monitoring in ENCOMPASS: reliable distributed transaction processing

VLDB '81 Proceedings of the seventh international conference on Very Large Data Bases - Volume 7
Tolerating byzantine faults in transaction processing systems using commit barrier scheduling

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Diverse replication for single-machine Byzantine-fault tolerance

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Implementing high availability memory with a duplication cache

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Tolerating hardware device failures in software

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
System support for scalable and fault tolerant internet services

Middleware '98 Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing
CuriOS: improving reliability through operating system structure

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Reliable distributed data stream management in mobile environments

Information Systems
Managing self-inflicted nondeterminism

HotDep'05 Proceedings of the First conference on Hot topics in system dependability
Evaluating the viability of process replication reliability for exascale systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Unstoppable stateful PHP web services

WISE'06 Proceedings of the 7th international conference on Web Information Systems
Efficient and coordinated checkpointing for reliable distributed data stream management

ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
Research: Designing a system infrastructure for distributed programs

Computer Communications
Operating system support for redundant multithreading

Proceedings of the tenth ACM international conference on Embedded software
A survey of checker architectures

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.05

Visualization

Abstract

The Tandem NonStop System is a fault-tolerant [1], expandable, and distributed computer system designed expressly for online transaction processing. This paper describes the key primitives of the kernel of the operating system. The first section describes the basic hardware building blocks and introduces their software analogs: processes and messages. Using these primitives, a mechanism that allows fault-tolerant resource access, the process-pair, is described. The paper concludes with some observations on this type of system structure and on actual use of the system.