BProc: the Beowulf distributed process space

Authors:
Erik Hendriks
Affiliations:
Los Alamos National Laboratory, Los Alamos, NM
Venue:
ICS '02 Proceedings of the 16th international conference on Supercomputing
Year:
2002

Citing 2
Cited 20

Using MPI: portable parallel programming with the message-passing interface

Using MPI: portable parallel programming with the message-passing interface
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

A network-failure-tolerant message-passing system for terascale clusters

ICS '02 Proceedings of the 16th international conference on Supercomputing
A Cluster Operating System Supporting Parallel Computing

Cluster Computing
STORM: lightning-fast resource management

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A network-failure-tolerant message-passing system for terascale clusters

International Journal of Parallel Programming
Current Practice and a Direction Forward in Checkpoint/Restart Implementations for Fault Tolerance

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18 - Volume 19
Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Right-weight kernels: an off-the-shelf alternative to custom light-weight kernels

ACM SIGOPS Operating Systems Review
How to build a fast and reliable 1024 node cluster with only one disk

The Journal of Supercomputing
STORM: Scalable Resource Management for Large-Scale Parallel Computers

IEEE Transactions on Computers
Middleware in Modern High Performance Computing System Architectures

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
Complementarity between Virtualization and Single System Image Technologies

Euro-Par 2008 Workshops - Parallel Processing
TakTuk, adaptive deployment of remote executions

Proceedings of the 18th ACM international symposium on High performance distributed computing
Remote Process Execution and Remote File I/O for Heterogeneous Processors in Cluster Systems

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Kittyhawk: enabling cooperation and competition in a global, shared computational system

IBM Journal of Research and Development
Impact of sub-optimal checkpoint intervals on application efficiency in computational clusters

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Performance and transparency of message passing and DSM services within the GENESIS operating system for managing parallelism on COWs

EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
FINAL: flexible and scalable composition of file system name spaces

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Fast and scalable startup of MPI programs in infiniband clusters

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Application monitoring and checkpointing in HPC: looking towards exascale systems

Proceedings of the 50th Annual Southeast Regional Conference
A design of hybrid operating system for a parallel computer with multi-core and many-core processors

Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Beowulf Distributed Process Space (BProc) is a set of Linux kernel modifications which provides a single system image and process migration facilities for processes running in a Beowulf style cluster. With BProc, all the processes running in a cluster are visible on the cluster front end machine and are controllable via existing UNIX process control mechanisms. Process creation is done on the front end machine and the processes are placed on the nodes where they will run with BProc's process migration mechanism.These two features combined greatly simplify creating and cleaning up parallel jobs as well as removing the necessity of a user login to remote nodes in the cluster. Removing the need for user logins drastically reduces the mount of software required on cluster nodes.Job startup with BProc's process migration mechanism is faster than the traditional method of logging into a node and starting the process with rsh. BProc does not affect file or network I/O of processes running on remote nodes so the vast majority of MPI applications will experience no performance loss as a result of being managed by BProc.