Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Implementing global memory management in a workstation cluster
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Serverless network file systems
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
The MOSIX multicomputer operating system for high performance cluster computing
Future Generation Computer Systems - Special issue on HPCN '97
Containers: A Sound Basis For a True Single System Image
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Deploying applications in multi-SAN SMP clusters
International Journal of Computational Science and Engineering
Hi-index | 0.01 |
In order to execute high performance applications on a cluster, it is highly desirable to provide distributed services that globally manage physical resources distributed over the cluster nodes. However, as a distributed service may use resources located on different nodes, it becomes sensitive to changes in the cluster configuration due to node addition, reboot or failure. In this paper,w e propose a generic service performing dynamic resource management in a cluster in order to provide distributed services with high availability. This service has been implemented in the Gobelins cluster operating system. The dynamic resource management service we propose makes node addition and reboot nearly transparent to all distributed services of Gobelins and, as a consequence, fully transparent to applications. In the event of a node failure, applications using resources located on the failed node need to be restarted from a previously saved checkpoint but the availability of the cluster operating system is guaranteed, provided that its distributed services implement reconfiguration features.