Global memory management for a multi computer system

Authors:
Dejan Milojicic;Steve Hoyle;Alan Messer;Albert Munoz;Lance Russell;Tom Wylegala;Vivekanand Vellanki;Stephen Childs
Affiliations:
HP Labs;HP Labs;HP Labs;HP Labs;HP Labs;HP Labs;Georgia Tech;Cambridge University
Venue:
WSS'00 Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4
Year:
2000

Citing 16
Cited 0

Simple but effective techniques for NUMA memory management

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinum

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
A distributed consistency server for the CHORUS system

SEDMS III Papers from the symposium on Experiences with distributed and multiprocessor systems
Hive: fault containment for shared-memory multiprocessors

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Implementing global memory management in a workstation cluster

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The Rio file cache: surviving operating system crashes

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
S/390 cluster technology: Parallel Sysplex

IBM Systems Journal
Hardware fault containment in scalable shared-memory multiprocessors

Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Disco: running commodity operating systems on scalable multiprocessors

Proceedings of the sixteenth ACM symposium on Operating systems principles
In search of clusters (2nd ed.)

In search of clusters (2nd ed.)
Extended memory management (XMM): lessons learned

Software—Practice & Experience - Special issue on multiprocessor operating systems
Cellular Disco: resource management using virtual clusters on shared-memory multiprocessors

Proceedings of the seventeenth ACM symposium on Operating systems principles
Hints for computer system design

SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
The performance of consistent checkpointing in distributed shared memory systems

SRDS '95 Proceedings of the 14TH Symposium on Reliable Distributed Systems
A Recoverable Distributed Shared Memory Integrating Coherence and Recoverability

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we discuss the design and implementation of fault-aware Global Memory Management (GMM) for a multikernel architecture. Scalability of today's systems is limited by SMP hardware, as well as by the underlying commodity operating systems (OS), such as Microsoft Windows or Linux. High availability is limited by insufficiently robust software and by hardware failures. Improving scalability and high availability are the main motivations for a multikernel architecture, and GMM plays a key role in achieving this. In our design, we extend the underlying OS with GMM supported by a set of software failure recovery modules in the form of device drivers. While the underlying OS manages the virtual address space and the local physical address space, the GMM module manages the global physical address space. We describe the GMM design, prototype implementation, and the use of GMM.